Add environment documentation from Openverse builder
This commit is contained in:
216
environment.md
Normal file
216
environment.md
Normal file
@@ -0,0 +1,216 @@
|
|||||||
|
# Game Design Document: **Honey Heist: Battle of the Bears**
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. Concept Paragraph
|
||||||
|
|
||||||
|
**Honey Heist: Battle of the Bears** is a deterministic, turn-based strategy game where two rival bears compete in a forest clearing to collect the most honey from a shared hive before it is depleted. Each bear must decide whether to **forage**, **defend**, or **steal** on its turn, balancing risk against reward. The environment features a finite supply of honey, and both bears act in partial opposition — every move affects the shared resource and their honey stores.
|
||||||
|
This design is **entirely unrelated to any negotiation or trading game**. It focuses instead on resource competition and strategic timing in an original woodland theme.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. Roles and Win Condition
|
||||||
|
|
||||||
|
- **Players:** Two rival bears: **Bear A** and **Bear B**.
|
||||||
|
- **Objective:** End the game with **more honey than the opponent** when the hive is empty or the maximum number of turns is reached.
|
||||||
|
- **Win Condition:**
|
||||||
|
- **Win:** A bear has more honey than the other at the end of the game.
|
||||||
|
- **Loss:** A bear has less honey than the other when the game ends.
|
||||||
|
- **Draw:** Both bears have equal honey at game end.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. Turn Structure and Determinism
|
||||||
|
|
||||||
|
- The game proceeds in **alternating turns** beginning with Bear A.
|
||||||
|
- On each turn, the current player submits **one valid action token**.
|
||||||
|
- Each turn’s outcome is **deterministic**, derived exclusively from the game state and player actions.
|
||||||
|
- The game lasts until either:
|
||||||
|
1. The **hive’s honey** reaches **0**, or
|
||||||
|
2. A **turn limit** (default = 10 rounds per bear, 20 turns total) is reached.
|
||||||
|
- A fixed **seed** ensures deterministic replay: the hive’s starting honey and any numeric parameters (like attack/defense outcomes) are seeded and reproducible.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. Action Grammar (Machine-Parseable)
|
||||||
|
|
||||||
|
### Allowed Player Actions
|
||||||
|
|
||||||
|
| Action Token | Pattern Format | Description |
|
||||||
|
|---------------|----------------|--------------|
|
||||||
|
| `[Forage:X]` | `^\[Forage:(?:1|2|3)\]$` | Gather **X** units of honey from the hive (reduces hive honey by X). |
|
||||||
|
| `[Defend]` | `^\[Defend\]$` | Protect your stored honey; blocks part of a steal attempt this turn. |
|
||||||
|
| `[Steal:X]` | `^\[Steal:(?:1|2|3)\]$` | Attempt to steal **X** honey from the opponent’s store (only effective if they did not defend). |
|
||||||
|
|
||||||
|
### Examples
|
||||||
|
|
||||||
|
**Valid:**
|
||||||
|
- `[Forage:2]` → Forages 2 honey from hive.
|
||||||
|
- `[Steal:3]` → Attempts to steal 3 honey from the opponent.
|
||||||
|
- `[Defend]` → Adopts a defensive stance.
|
||||||
|
|
||||||
|
**Invalid:**
|
||||||
|
- `[Forage:5]` → Invalid, X must be within 1–3.
|
||||||
|
- `[Steal honey]` → Invalid format; must use integer param.
|
||||||
|
- `[Hide]` → Invalid token; not part of the grammar.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. Game State Schema
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"turn_number": 7,
|
||||||
|
"current_player": "BearB",
|
||||||
|
"hive_honey": 12,
|
||||||
|
"max_turns": 20,
|
||||||
|
"players": {
|
||||||
|
"BearA": {
|
||||||
|
"stored_honey": 9,
|
||||||
|
"last_action": "[Forage:2]",
|
||||||
|
"defending": false,
|
||||||
|
"score": 9
|
||||||
|
},
|
||||||
|
"BearB": {
|
||||||
|
"stored_honey": 10,
|
||||||
|
"last_action": "[Steal:3]",
|
||||||
|
"defending": false,
|
||||||
|
"score": 10
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"history": [
|
||||||
|
{"turn":1,"actor":"BearA","action":"[Forage:3]"},
|
||||||
|
{"turn":2,"actor":"BearB","action":"[Defend]"},
|
||||||
|
{"turn":3,"actor":"BearA","action":"[Steal:2]"}
|
||||||
|
],
|
||||||
|
"winner": null,
|
||||||
|
"draw": false,
|
||||||
|
"seed": 12345
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6. Initialization Rules
|
||||||
|
|
||||||
|
- **Hive honey** starts between **15 and 20**, seeded deterministically.
|
||||||
|
- **Stored honey** for each player starts at **0**.
|
||||||
|
- **Turn number** begins at **1**, with **BearA** going first.
|
||||||
|
- A random seed determines only initial hive honey; all later effects are deterministic.
|
||||||
|
- On **reset()**, both players receive initial observations describing:
|
||||||
|
- Starting hive size
|
||||||
|
- Their honey score (=0)
|
||||||
|
- Available action grammar
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 7. Validation and Error Handling
|
||||||
|
|
||||||
|
### Illegal Actions
|
||||||
|
The environment validates the extracted content (from within `\boxed{}`).
|
||||||
|
|
||||||
|
**Invalid Move Reasons:**
|
||||||
|
1. Action string **empty or malformed** → `"Invalid format, must use [Forage:X], [Steal:X], or [Defend]."`
|
||||||
|
2. **X out of range** for `[Forage:X]` or `[Steal:X]` → `"Illegal quantity, X must be 1–3."`
|
||||||
|
3. Attempt to forage more honey than exists in hive → `"Not enough honey in hive."`
|
||||||
|
4. Attempt to steal more honey than opponent has → `"Opponent has insufficient honey."`
|
||||||
|
5. Action submitted after terminal state → `"Game is already over."`
|
||||||
|
|
||||||
|
When validation fails, the environment triggers `set_invalid_move(reason)` and ends the turn without effect.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 8. Terminal Conditions and Scoring
|
||||||
|
|
||||||
|
### Checked after each pair of turns (full round):
|
||||||
|
|
||||||
|
1. **Hive depletion:** if `hive_honey <= 0`, game ends.
|
||||||
|
2. **Turn limit:** if `turn_number > max_turns`, game ends.
|
||||||
|
3. **All honey gathered:** if total honey = 0 (hive + players), end early.
|
||||||
|
|
||||||
|
**Scoring:**
|
||||||
|
- Each bear’s score = `stored_honey`.
|
||||||
|
- **Winner = bear with higher stored_honey.**
|
||||||
|
- **Draw** if equal.
|
||||||
|
|
||||||
|
**Tie-breaker:** If equal and turns remain, continue until max_turns; if still equal at end → draw.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 9. Player Prompt Specification
|
||||||
|
|
||||||
|
Each player receives a contextual prompt describing their role and the rules:
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Prompt Outline
|
||||||
|
|
||||||
|
```
|
||||||
|
You are a hungry bear competing for the last honey in the forest.
|
||||||
|
- Your goal: End the game with more honey than your rival.
|
||||||
|
- Each turn, choose ONE of the following actions:
|
||||||
|
[Forage:X] Gather X units (1–3) from the hive.
|
||||||
|
[Defend] Protect your honey from theft this turn.
|
||||||
|
[Steal:X] Steal X units (1–3) from your rival if they do not defend.
|
||||||
|
|
||||||
|
Game facts:
|
||||||
|
- Hive honey remaining: {hive_honey}
|
||||||
|
- Your stored honey: {player_honey}
|
||||||
|
- Rival stored honey: {rival_honey}
|
||||||
|
- Turn {turn_number} / {max_turns}
|
||||||
|
|
||||||
|
Format rule:
|
||||||
|
State your reasoning briefly, then put your final action in the following format at the end:
|
||||||
|
|
||||||
|
"Put your final answer within \\boxed{{}} at the end of your response."
|
||||||
|
|
||||||
|
Example valid response:
|
||||||
|
I will risk a quick grab of honey before the hive gets low.
|
||||||
|
\boxed{{[Forage:3]}}
|
||||||
|
|
||||||
|
Example invalid response:
|
||||||
|
\boxed{{Forage 3}} <-- Must include brackets and colon.
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
A helper `_extract_answer_content(self, action: str) -> str` will strip the outer `\boxed{}` to obtain the core action token for parsing and validation.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 10. API Mapping Plan
|
||||||
|
|
||||||
|
### `reset(seed)`
|
||||||
|
- Initialize `game_state` fields as specified.
|
||||||
|
- Apply seed to determine initial `hive_honey`.
|
||||||
|
- Set both `stored_honey` = 0, `defending`=False.
|
||||||
|
- Return the first observation for **BearA**.
|
||||||
|
|
||||||
|
### `step(action)`
|
||||||
|
- Extract content from player’s input using `_extract_answer_content`.
|
||||||
|
- Validate; if illegal, set invalid move and skip reward changes.
|
||||||
|
- Apply deterministic resolution:
|
||||||
|
- `[Forage:X]`: Reduce hive honey by X, add to player’s store.
|
||||||
|
- `[Defend]`: Set defending=True.
|
||||||
|
- `[Steal:X]`: If opponent.defending=False, transfer min(X, opponent.honey) to current player.
|
||||||
|
- Reset defending flags at the end of round.
|
||||||
|
- Increment `turn_number`, switch `current_player`.
|
||||||
|
- Check terminal conditions and set `winner`/`draw` if applicable.
|
||||||
|
- Produce updated observation for next player.
|
||||||
|
|
||||||
|
### `_generate_player_prompt(player_id)`
|
||||||
|
- Inserts current state data into the prompt template (see section 9).
|
||||||
|
- Explains available actions, format rules, and examples.
|
||||||
|
- Used each turn to query model actions in deterministic play.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 11. Copy-Check Against Example
|
||||||
|
|
||||||
|
This design—
|
||||||
|
- Uses **bears, honey, and forest resources**, not trade, offers, or negotiation.
|
||||||
|
- Has **unique resource names**: `hive_honey`, `stored_honey`, `defending`, etc.
|
||||||
|
- Defines **original objectives** (collect more honey) and **distinct action grammar**.
|
||||||
|
- Uses distinct theme and prompt text centered on **competitive foraging bears**.
|
||||||
|
|
||||||
|
All terminology and `game_state` keys are purely original to *Honey Heist: Battle of the Bears*.
|
||||||
Reference in New Issue
Block a user