diff --git a/environment.md b/environment.md new file mode 100644 index 0000000..53d3e39 --- /dev/null +++ b/environment.md @@ -0,0 +1,201 @@ +# GAME DESIGN DOCUMENT — **"StarGrid Duel"** + +--- + +### 1. Concept Paragraph +**StarGrid Duel** is a deterministic, turn-based strategy game inspired by the simplicity of grid conquest, but it is **not** tic‑tac‑toe. Two rival star‑navigators take turns deploying *energy beacons* on a 3×3 stellar grid. Their aim is to align three of their own beacons in a straight line of cosmic power (horizontal, vertical, or diagonal) before the opponent does, or to fill the grid entirely for a balanced standoff. Players will issue commands like `[Place: A2]` to deposit a beacon on a coordinate. The environment is purely deterministic: no randomness or negotiation mechanics are involved. The game’s purpose is to measure spatial foresight and terminal pattern recognition—completely unrelated to any negotiation or resource trading examples. + +--- + +### 2. Roles and Win Condition +- **Roles** + - **Player A ("Navigator Alpha")**: Uses energy color **Blue**. + - **Player B ("Navigator Beta")**: Uses energy color **Crimson**. + +- **Objective** + Be the first navigator to align three of your beacons continuously (row, column, or diagonal) on the 3×3 StarGrid. + +- **Win Rule** + - A player **wins** immediately upon forming a line of three of their own symbols. + - The game is a **draw** if all nine cells are filled without a three‑in‑a‑line configuration. + - Upon win or draw, the game enters a terminal state and no further actions are accepted. + +--- + +### 3. Turn Structure and Determinism +- Players alternate turns beginning with Player A at turn index `0`. +- Each turn is atomic: exactly one action is taken. +- A deterministic seed ensures that initialization and any potential random ordering (none required here, but included for reproducibility) follow identical patterns. +- The turn counter increments after each valid action. Once nine valid turns have been processed or a win condition is met, the environment halts. + +--- + +### 4. Action Grammar (Machine‑Parsable) +Players specify grid placement commands targeting one unused cell. + +**Allowed Actions** +``` +[Place: ] +``` + +**Cell IDs** +Valid values: `A1, A2, A3, B1, B2, B3, C1, C2, C3` (Rows A–C, Columns 1–3) + +**Formal Pattern (Regex)** +`^\[Place:\s*(A|B|C)(1|2|3)\]$` + +**Examples** +- **Valid:** `[Place: B2]` → Places player’s beacon in the center cell. +- **Invalid Examples:** + - `[place: B2]` → Invalid capitalization and token name. + - `[Place: D1]` → `D1` not in allowed grid range. + - `[Deploy: A1]` → Invalid action token. + - `[Place: B2 extra]` → Extra text violates strict grammar. + +All player outputs later will be wrapped in `\boxed{{…}}`. The implementation will extract the internal `[Place: X#]` command to validate according to the above pattern. + +--- + +### 5. Game State Schema +```json +{ + "turn_index": 4, + "active_player": "B", + "board": { + "A1": "Blue", + "A2": null, + "A3": "Crimson", + "B1": "Blue", + "B2": "Crimson", + "B3": null, + "C1": null, + "C2": null, + "C3": null + }, + "player_symbols": { + "A": "Blue", + "B": "Crimson" + }, + "move_history": [ + {"player": "A", "action": "[Place: A1]"}, + {"player": "B", "action": "[Place: A3]"}, + {"player": "A", "action": "[Place: B1]"}, + {"player": "B", "action": "[Place: B2]"} + ], + "winner": null, + "is_draw": false, + "observations": { + "A": "Text transcript of latest game state for Alpha", + "B": "Text transcript of latest game state for Beta" + }, + "seed": 42 +} +``` + +--- + +### 6. Initialization Rules +- `reset(seed)` initializes an empty 3×3 board with all cells `null`. +- The turn index resets to `0` with `active_player = "A"`. +- The same seed always ensures that turn order, board labeling, and any deterministic tie logic behave identically. +- Both players receive an onboarding observation describing: + - Empty StarGrid layout + - Their color and symbol + - Instructions and the legal action syntax + +--- + +### 7. Validation and Error Handling +- Upon receiving a player move, extract the content inside `\boxed{{}}` using `_extract_answer_content`. +- Validate against the regex `^\[Place:\s*(A|B|C)(1|2|3)\]$`. +- Check that the specified cell is unoccupied. +- **Invalid Move Reasons** + - `"MalformedAction"`: Does not match required pattern. + - `"CellOutOfRange"`: Coordinate not part of StarGrid labels. + - `"CellOccupied"`: Target cell already taken. + - `"NotYourTurn"`: Attempt to act out of sequence after loss or between turns. + The environment calls `set_invalid_move(reason)` with a human-readable reason, retaining determinism (the turn is forfeited or handled as draw according to policy). + +--- + +### 8. Terminal Conditions and Scoring +**Checks each turn immediately after placing a valid beacon:** +1. **Victory Check** – If the current player’s beacons form any of the eight winning line patterns, set `winner = active_player`, terminate game. +2. **Draw Check** – If no empty cells remain and no winner exists, set `is_draw = true`. +3. **Scoring** – + - Win: `+1` score for winner, `0` for loser. + - Draw: `0.5` each as tie credit (for potential series mode). + +**Tie‑Break Procedure** +If multiple win conditions appear simultaneously (impossible under normal rules), the first detected alignment pattern is applied deterministically. + +--- + +### 9. Player Prompt Specification +Each player receives a structured prompt reflecting the current board and legal moves. + +**Prompt Outline** + +> **Identity Blurb:** +> You are a star navigator placing energy beacons on a galactic grid. Each cell you claim radiates your color’s energy. The goal is to align three of your beacons in a line before the opponent. + +> **Current Board State:** +> - Display a 3×3 grid with coordinates and current occupancy. + +> **Your Color:** Blue or Crimson +> **Turn Information:** Which player moves next (`Navigator Alpha` or `Navigator Beta`) + +> **Allowed Actions:** +> Format: `[Place: ]`, where `` ∈ {A1,…,C3} and the cell must be empty. +> You must wrap your selected action inside `\boxed{{}}` at the end of your message. + +> **Response Format:** +> You may reason about your move, then output your final choice within `\boxed{{}}`. + +**Few‑Shot Examples** + +``` +Example valid response: +I will claim the center of the grid to control diagonals. +\boxed{{[Place: B2]}} + +Example invalid response: +I think I'll move now. +\boxed{{[Move: B2]}} ← "Move" not a valid token. +``` + +The function `_extract_answer_content(self, action: str) -> str` will remove `\boxed{{}}` wrappers and yield `[Place: X#]` for validation. + +--- + +### 10. API Mapping Plan + +- **`reset(seed)`** + - Sets initial empty `board`, `turn_index=0`, and seeds RNG for determinism. + - Returns initial observations (`"Navigator Alpha"`, `"Navigator Beta"`). + +- **`step(player_action)`** + - Extracts action token with `_extract_answer_content`. + - Validates syntax and target cell availability. + - Updates `board`, appends to `move_history`, increments `turn_index`. + - After update, executes terminal checks (victory or draw). + - Produces new observations describing updated board state. + +- **`_generate_player_prompt(player_id)`** + - Compiles textual description of board, current scores, and open cells. + - Lists permitted `[Place: ]` choices. + - Concludes with directive: *Put your final answer within \boxed{{}} at the end of your response.* + +All actions and resultant board states are deterministic given identical seeds and action sequences. + +--- + +### 11. Copy‑Check Against the Example +- The environment, terminology, and objective are **entirely original**. +- There is **no negotiation**, **no trading**, **no resource exchange**, and **no alignment with any bargaining mechanics** from the example environment. +- Entities (“Navigator Alpha/Beta,” “energy beacons,” “StarGrid”) and game state keys (`board`, `player_symbols`, `move_history`, etc.) are unique to this design. +- The theme is cosmic grid conquest, **not** any prior example domain. + +--- + +**End of Design Document – “StarGrid Duel”** \ No newline at end of file