Files
honey-heist-battle/environment.md
2001-01-01 00:00:00 +00:00

216 lines
7.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Game Design Document: **Honey Heist: Battle of the Bears**
---
## 1. Concept Paragraph
**Honey Heist: Battle of the Bears** is a deterministic, turn-based strategy game where two rival bears compete in a forest clearing to collect the most honey from a shared hive before it is depleted. Each bear must decide whether to **forage**, **defend**, or **steal** on its turn, balancing risk against reward. The environment features a finite supply of honey, and both bears act in partial opposition — every move affects the shared resource and their honey stores.
This design is **entirely unrelated to any negotiation or trading game**. It focuses instead on resource competition and strategic timing in an original woodland theme.
---
## 2. Roles and Win Condition
- **Players:** Two rival bears: **Bear A** and **Bear B**.
- **Objective:** End the game with **more honey than the opponent** when the hive is empty or the maximum number of turns is reached.
- **Win Condition:**
- **Win:** A bear has more honey than the other at the end of the game.
- **Loss:** A bear has less honey than the other when the game ends.
- **Draw:** Both bears have equal honey at game end.
---
## 3. Turn Structure and Determinism
- The game proceeds in **alternating turns** beginning with Bear A.
- On each turn, the current player submits **one valid action token**.
- Each turns outcome is **deterministic**, derived exclusively from the game state and player actions.
- The game lasts until either:
1. The **hives honey** reaches **0**, or
2. A **turn limit** (default = 10 rounds per bear, 20 turns total) is reached.
- A fixed **seed** ensures deterministic replay: the hives starting honey and any numeric parameters (like attack/defense outcomes) are seeded and reproducible.
---
## 4. Action Grammar (Machine-Parseable)
### Allowed Player Actions
| Action Token | Pattern Format | Description |
|---------------|----------------|--------------|
| `[Forage:X]` | `^\[Forage:(?:1|2|3)\]$` | Gather **X** units of honey from the hive (reduces hive honey by X). |
| `[Defend]` | `^\[Defend\]$` | Protect your stored honey; blocks part of a steal attempt this turn. |
| `[Steal:X]` | `^\[Steal:(?:1|2|3)\]$` | Attempt to steal **X** honey from the opponents store (only effective if they did not defend). |
### Examples
**Valid:**
- `[Forage:2]` → Forages 2 honey from hive.
- `[Steal:3]` → Attempts to steal 3 honey from the opponent.
- `[Defend]` → Adopts a defensive stance.
**Invalid:**
- `[Forage:5]` → Invalid, X must be within 13.
- `[Steal honey]` → Invalid format; must use integer param.
- `[Hide]` → Invalid token; not part of the grammar.
---
## 5. Game State Schema
```json
{
"turn_number": 7,
"current_player": "BearB",
"hive_honey": 12,
"max_turns": 20,
"players": {
"BearA": {
"stored_honey": 9,
"last_action": "[Forage:2]",
"defending": false,
"score": 9
},
"BearB": {
"stored_honey": 10,
"last_action": "[Steal:3]",
"defending": false,
"score": 10
}
},
"history": [
{"turn":1,"actor":"BearA","action":"[Forage:3]"},
{"turn":2,"actor":"BearB","action":"[Defend]"},
{"turn":3,"actor":"BearA","action":"[Steal:2]"}
],
"winner": null,
"draw": false,
"seed": 12345
}
```
---
## 6. Initialization Rules
- **Hive honey** starts between **15 and 20**, seeded deterministically.
- **Stored honey** for each player starts at **0**.
- **Turn number** begins at **1**, with **BearA** going first.
- A random seed determines only initial hive honey; all later effects are deterministic.
- On **reset()**, both players receive initial observations describing:
- Starting hive size
- Their honey score (=0)
- Available action grammar
---
## 7. Validation and Error Handling
### Illegal Actions
The environment validates the extracted content (from within `\boxed{}`).
**Invalid Move Reasons:**
1. Action string **empty or malformed**`"Invalid format, must use [Forage:X], [Steal:X], or [Defend]."`
2. **X out of range** for `[Forage:X]` or `[Steal:X]``"Illegal quantity, X must be 13."`
3. Attempt to forage more honey than exists in hive → `"Not enough honey in hive."`
4. Attempt to steal more honey than opponent has → `"Opponent has insufficient honey."`
5. Action submitted after terminal state → `"Game is already over."`
When validation fails, the environment triggers `set_invalid_move(reason)` and ends the turn without effect.
---
## 8. Terminal Conditions and Scoring
### Checked after each pair of turns (full round):
1. **Hive depletion:** if `hive_honey <= 0`, game ends.
2. **Turn limit:** if `turn_number > max_turns`, game ends.
3. **All honey gathered:** if total honey = 0 (hive + players), end early.
**Scoring:**
- Each bears score = `stored_honey`.
- **Winner = bear with higher stored_honey.**
- **Draw** if equal.
**Tie-breaker:** If equal and turns remain, continue until max_turns; if still equal at end → draw.
---
## 9. Player Prompt Specification
Each player receives a contextual prompt describing their role and the rules:
---
### Prompt Outline
```
You are a hungry bear competing for the last honey in the forest.
- Your goal: End the game with more honey than your rival.
- Each turn, choose ONE of the following actions:
[Forage:X] Gather X units (13) from the hive.
[Defend] Protect your honey from theft this turn.
[Steal:X] Steal X units (13) from your rival if they do not defend.
Game facts:
- Hive honey remaining: {hive_honey}
- Your stored honey: {player_honey}
- Rival stored honey: {rival_honey}
- Turn {turn_number} / {max_turns}
Format rule:
State your reasoning briefly, then put your final action in the following format at the end:
"Put your final answer within \\boxed{{}} at the end of your response."
Example valid response:
I will risk a quick grab of honey before the hive gets low.
\boxed{{[Forage:3]}}
Example invalid response:
\boxed{{Forage 3}} <-- Must include brackets and colon.
```
---
A helper `_extract_answer_content(self, action: str) -> str` will strip the outer `\boxed{}` to obtain the core action token for parsing and validation.
---
## 10. API Mapping Plan
### `reset(seed)`
- Initialize `game_state` fields as specified.
- Apply seed to determine initial `hive_honey`.
- Set both `stored_honey` = 0, `defending`=False.
- Return the first observation for **BearA**.
### `step(action)`
- Extract content from players input using `_extract_answer_content`.
- Validate; if illegal, set invalid move and skip reward changes.
- Apply deterministic resolution:
- `[Forage:X]`: Reduce hive honey by X, add to players store.
- `[Defend]`: Set defending=True.
- `[Steal:X]`: If opponent.defending=False, transfer min(X, opponent.honey) to current player.
- Reset defending flags at the end of round.
- Increment `turn_number`, switch `current_player`.
- Check terminal conditions and set `winner`/`draw` if applicable.
- Produce updated observation for next player.
### `_generate_player_prompt(player_id)`
- Inserts current state data into the prompt template (see section 9).
- Explains available actions, format rules, and examples.
- Used each turn to query model actions in deterministic play.
---
## 11. Copy-Check Against Example
This design—
- Uses **bears, honey, and forest resources**, not trade, offers, or negotiation.
- Has **unique resource names**: `hive_honey`, `stored_honey`, `defending`, etc.
- Defines **original objectives** (collect more honey) and **distinct action grammar**.
- Uses distinct theme and prompt text centered on **competitive foraging bears**.
All terminology and `game_state` keys are purely original to *Honey Heist: Battle of the Bears*.