diff --git a/environment.md b/environment.md new file mode 100644 index 0000000..cba57d3 --- /dev/null +++ b/environment.md @@ -0,0 +1,216 @@ +# Game Design Document: **Honey Heist: Battle of the Bears** + +--- + +## 1. Concept Paragraph + +**Honey Heist: Battle of the Bears** is a deterministic, turn-based strategy game where two rival bears compete in a forest clearing to collect the most honey from a shared hive before it is depleted. Each bear must decide whether to **forage**, **defend**, or **steal** on its turn, balancing risk against reward. The environment features a finite supply of honey, and both bears act in partial opposition — every move affects the shared resource and their honey stores. +This design is **entirely unrelated to any negotiation or trading game**. It focuses instead on resource competition and strategic timing in an original woodland theme. + +--- + +## 2. Roles and Win Condition + +- **Players:** Two rival bears: **Bear A** and **Bear B**. +- **Objective:** End the game with **more honey than the opponent** when the hive is empty or the maximum number of turns is reached. +- **Win Condition:** + - **Win:** A bear has more honey than the other at the end of the game. + - **Loss:** A bear has less honey than the other when the game ends. + - **Draw:** Both bears have equal honey at game end. + +--- + +## 3. Turn Structure and Determinism + +- The game proceeds in **alternating turns** beginning with Bear A. +- On each turn, the current player submits **one valid action token**. +- Each turn’s outcome is **deterministic**, derived exclusively from the game state and player actions. +- The game lasts until either: + 1. The **hive’s honey** reaches **0**, or + 2. A **turn limit** (default = 10 rounds per bear, 20 turns total) is reached. +- A fixed **seed** ensures deterministic replay: the hive’s starting honey and any numeric parameters (like attack/defense outcomes) are seeded and reproducible. + +--- + +## 4. Action Grammar (Machine-Parseable) + +### Allowed Player Actions + +| Action Token | Pattern Format | Description | +|---------------|----------------|--------------| +| `[Forage:X]` | `^\[Forage:(?:1|2|3)\]$` | Gather **X** units of honey from the hive (reduces hive honey by X). | +| `[Defend]` | `^\[Defend\]$` | Protect your stored honey; blocks part of a steal attempt this turn. | +| `[Steal:X]` | `^\[Steal:(?:1|2|3)\]$` | Attempt to steal **X** honey from the opponent’s store (only effective if they did not defend). | + +### Examples + +**Valid:** +- `[Forage:2]` → Forages 2 honey from hive. +- `[Steal:3]` → Attempts to steal 3 honey from the opponent. +- `[Defend]` → Adopts a defensive stance. + +**Invalid:** +- `[Forage:5]` → Invalid, X must be within 1–3. +- `[Steal honey]` → Invalid format; must use integer param. +- `[Hide]` → Invalid token; not part of the grammar. + +--- + +## 5. Game State Schema + +```json +{ + "turn_number": 7, + "current_player": "BearB", + "hive_honey": 12, + "max_turns": 20, + "players": { + "BearA": { + "stored_honey": 9, + "last_action": "[Forage:2]", + "defending": false, + "score": 9 + }, + "BearB": { + "stored_honey": 10, + "last_action": "[Steal:3]", + "defending": false, + "score": 10 + } + }, + "history": [ + {"turn":1,"actor":"BearA","action":"[Forage:3]"}, + {"turn":2,"actor":"BearB","action":"[Defend]"}, + {"turn":3,"actor":"BearA","action":"[Steal:2]"} + ], + "winner": null, + "draw": false, + "seed": 12345 +} +``` + +--- + +## 6. Initialization Rules + +- **Hive honey** starts between **15 and 20**, seeded deterministically. +- **Stored honey** for each player starts at **0**. +- **Turn number** begins at **1**, with **BearA** going first. +- A random seed determines only initial hive honey; all later effects are deterministic. +- On **reset()**, both players receive initial observations describing: + - Starting hive size + - Their honey score (=0) + - Available action grammar + +--- + +## 7. Validation and Error Handling + +### Illegal Actions +The environment validates the extracted content (from within `\boxed{}`). + +**Invalid Move Reasons:** +1. Action string **empty or malformed** → `"Invalid format, must use [Forage:X], [Steal:X], or [Defend]."` +2. **X out of range** for `[Forage:X]` or `[Steal:X]` → `"Illegal quantity, X must be 1–3."` +3. Attempt to forage more honey than exists in hive → `"Not enough honey in hive."` +4. Attempt to steal more honey than opponent has → `"Opponent has insufficient honey."` +5. Action submitted after terminal state → `"Game is already over."` + +When validation fails, the environment triggers `set_invalid_move(reason)` and ends the turn without effect. + +--- + +## 8. Terminal Conditions and Scoring + +### Checked after each pair of turns (full round): + +1. **Hive depletion:** if `hive_honey <= 0`, game ends. +2. **Turn limit:** if `turn_number > max_turns`, game ends. +3. **All honey gathered:** if total honey = 0 (hive + players), end early. + +**Scoring:** +- Each bear’s score = `stored_honey`. +- **Winner = bear with higher stored_honey.** +- **Draw** if equal. + +**Tie-breaker:** If equal and turns remain, continue until max_turns; if still equal at end → draw. + +--- + +## 9. Player Prompt Specification + +Each player receives a contextual prompt describing their role and the rules: + +--- + +### Prompt Outline + +``` +You are a hungry bear competing for the last honey in the forest. +- Your goal: End the game with more honey than your rival. +- Each turn, choose ONE of the following actions: + [Forage:X] Gather X units (1–3) from the hive. + [Defend] Protect your honey from theft this turn. + [Steal:X] Steal X units (1–3) from your rival if they do not defend. + +Game facts: +- Hive honey remaining: {hive_honey} +- Your stored honey: {player_honey} +- Rival stored honey: {rival_honey} +- Turn {turn_number} / {max_turns} + +Format rule: +State your reasoning briefly, then put your final action in the following format at the end: + +"Put your final answer within \\boxed{{}} at the end of your response." + +Example valid response: +I will risk a quick grab of honey before the hive gets low. +\boxed{{[Forage:3]}} + +Example invalid response: +\boxed{{Forage 3}} <-- Must include brackets and colon. +``` + +--- + +A helper `_extract_answer_content(self, action: str) -> str` will strip the outer `\boxed{}` to obtain the core action token for parsing and validation. + +--- + +## 10. API Mapping Plan + +### `reset(seed)` +- Initialize `game_state` fields as specified. +- Apply seed to determine initial `hive_honey`. +- Set both `stored_honey` = 0, `defending`=False. +- Return the first observation for **BearA**. + +### `step(action)` +- Extract content from player’s input using `_extract_answer_content`. +- Validate; if illegal, set invalid move and skip reward changes. +- Apply deterministic resolution: + - `[Forage:X]`: Reduce hive honey by X, add to player’s store. + - `[Defend]`: Set defending=True. + - `[Steal:X]`: If opponent.defending=False, transfer min(X, opponent.honey) to current player. +- Reset defending flags at the end of round. +- Increment `turn_number`, switch `current_player`. +- Check terminal conditions and set `winner`/`draw` if applicable. +- Produce updated observation for next player. + +### `_generate_player_prompt(player_id)` +- Inserts current state data into the prompt template (see section 9). +- Explains available actions, format rules, and examples. +- Used each turn to query model actions in deterministic play. + +--- + +## 11. Copy-Check Against Example + +This design— +- Uses **bears, honey, and forest resources**, not trade, offers, or negotiation. +- Has **unique resource names**: `hive_honey`, `stored_honey`, `defending`, etc. +- Defines **original objectives** (collect more honey) and **distinct action grammar**. +- Uses distinct theme and prompt text centered on **competitive foraging bears**. + +All terminology and `game_state` keys are purely original to *Honey Heist: Battle of the Bears*. \ No newline at end of file