Add environment documentation from Openverse builder

2001-01-01 00:00:00 +00:00
parent 1e40154fa0
commit f955406004
1 changed files with 201 additions and 0 deletions
--- a/environment.md
+++ b/environment.md
@@ -0,0 +1,201 @@
 # GAME DESIGN DOCUMENT — **"StarGrid Duel"**
 ---
 ### 1. Concept Paragraph
 **StarGrid Duel** is a deterministic, turn-based strategy game inspired by the simplicity of grid conquest, but it is **not** tic‑tac‑toe. Two rival star‑navigators take turns deploying *energy beacons* on a 3×3 stellar grid. Their aim is to align three of their own beacons in a straight line of cosmic power (horizontal, vertical, or diagonal) before the opponent does, or to fill the grid entirely for a balanced standoff. Players will issue commands like `[Place: A2]` to deposit a beacon on a coordinate. The environment is purely deterministic: no randomness or negotiation mechanics are involved. The game’s purpose is to measure spatial foresight and terminal pattern recognition—completely unrelated to any negotiation or resource trading examples.
 ---
 ### 2. Roles and Win Condition
 - **Roles**  
  - **Player A ("Navigator Alpha")**: Uses energy color **Blue**.  
  - **Player B ("Navigator Beta")**: Uses energy color **Crimson**.  
 - **Objective**  
  Be the first navigator to align three of your beacons continuously (row, column, or diagonal) on the 3×3 StarGrid.
 - **Win Rule**
  - A player **wins** immediately upon forming a line of three of their own symbols.
  - The game is a **draw** if all nine cells are filled without a three‑in‑a‑line configuration.
  - Upon win or draw, the game enters a terminal state and no further actions are accepted.
 ---
 ### 3. Turn Structure and Determinism
 - Players alternate turns beginning with Player A at turn index `0`.
 - Each turn is atomic: exactly one action is taken.
 - A deterministic seed ensures that initialization and any potential random ordering (none required here, but included for reproducibility) follow identical patterns.
 - The turn counter increments after each valid action. Once nine valid turns have been processed or a win condition is met, the environment halts.
 ---
 ### 4. Action Grammar (Machine‑Parsable)
 Players specify grid placement commands targeting one unused cell.
 **Allowed Actions**
 ```
 [Place: <cell_id>]
 ```
 **Cell IDs**  
 Valid values: `A1, A2, A3, B1, B2, B3, C1, C2, C3` (Rows A–C, Columns 1–3)
 **Formal Pattern (Regex)**  
 `^\[Place:\s*(A|B|C)(1|2|3)\]$`
 **Examples**
 - **Valid:** `[Place: B2]` → Places player’s beacon in the center cell.
 - **Invalid Examples:**  
  - `[place: B2]` → Invalid capitalization and token name.  
  - `[Place: D1]` → `D1` not in allowed grid range.  
  - `[Deploy: A1]` → Invalid action token.  
  - `[Place: B2 extra]` → Extra text violates strict grammar.
 All player outputs later will be wrapped in `\boxed{{…}}`. The implementation will extract the internal `[Place: X#]` command to validate according to the above pattern.
 ---
 ### 5. Game State Schema
 ```json
 {
  "turn_index": 4,
  "active_player": "B",
  "board": {
    "A1": "Blue",
    "A2": null,
    "A3": "Crimson",
    "B1": "Blue",
    "B2": "Crimson",
    "B3": null,
    "C1": null,
    "C2": null,
    "C3": null
  },
  "player_symbols": {
    "A": "Blue",
    "B": "Crimson"
  },
  "move_history": [
    {"player": "A", "action": "[Place: A1]"},
    {"player": "B", "action": "[Place: A3]"},
    {"player": "A", "action": "[Place: B1]"},
    {"player": "B", "action": "[Place: B2]"}
  ],
  "winner": null,
  "is_draw": false,
  "observations": {
    "A": "Text transcript of latest game state for Alpha",
    "B": "Text transcript of latest game state for Beta"
  },
  "seed": 42
 }
 ```
 ---
 ### 6. Initialization Rules
 - `reset(seed)` initializes an empty 3×3 board with all cells `null`.
 - The turn index resets to `0` with `active_player = "A"`.
 - The same seed always ensures that turn order, board labeling, and any deterministic tie logic behave identically.
 - Both players receive an onboarding observation describing:
  - Empty StarGrid layout
  - Their color and symbol
  - Instructions and the legal action syntax
 ---
 ### 7. Validation and Error Handling
 - Upon receiving a player move, extract the content inside `\boxed{{}}` using `_extract_answer_content`.
 - Validate against the regex `^\[Place:\s*(A|B|C)(1|2|3)\]$`.
 - Check that the specified cell is unoccupied.
 - **Invalid Move Reasons**  
  - `"MalformedAction"`: Does not match required pattern.  
  - `"CellOutOfRange"`: Coordinate not part of StarGrid labels.  
  - `"CellOccupied"`: Target cell already taken.  
  - `"NotYourTurn"`: Attempt to act out of sequence after loss or between turns.  
  The environment calls `set_invalid_move(reason)` with a human-readable reason, retaining determinism (the turn is forfeited or handled as draw according to policy).
 ---
 ### 8. Terminal Conditions and Scoring
 **Checks each turn immediately after placing a valid beacon:**
 1. **Victory Check** – If the current player’s beacons form any of the eight winning line patterns, set `winner = active_player`, terminate game.
 2. **Draw Check** – If no empty cells remain and no winner exists, set `is_draw = true`.
 3. **Scoring** –  
   - Win: `+1` score for winner, `0` for loser.  
   - Draw: `0.5` each as tie credit (for potential series mode).
 **Tie‑Break Procedure**  
 If multiple win conditions appear simultaneously (impossible under normal rules), the first detected alignment pattern is applied deterministically.
 ---
 ### 9. Player Prompt Specification
 Each player receives a structured prompt reflecting the current board and legal moves.
 **Prompt Outline**
 > **Identity Blurb:**  
 > You are a star navigator placing energy beacons on a galactic grid. Each cell you claim radiates your color’s energy. The goal is to align three of your beacons in a line before the opponent.
 > **Current Board State:**  
 > - Display a 3×3 grid with coordinates and current occupancy.
 > **Your Color:** Blue or Crimson  
 > **Turn Information:** Which player moves next (`Navigator Alpha` or `Navigator Beta`)  
 > **Allowed Actions:**  
 > Format: `[Place: <cell_id>]`, where `<cell_id>` ∈ {A1,…,C3} and the cell must be empty.  
 > You must wrap your selected action inside `\boxed{{}}` at the end of your message.
 > **Response Format:**  
 > You may reason about your move, then output your final choice within `\boxed{{}}`.
 **Few‑Shot Examples**
 ```
 Example valid response:
 I will claim the center of the grid to control diagonals.
 \boxed{{[Place: B2]}}
 Example invalid response:
 I think I'll move now.
 \boxed{{[Move: B2]}}  ← "Move" not a valid token.
 ```
 The function `_extract_answer_content(self, action: str) -> str` will remove `\boxed{{}}` wrappers and yield `[Place: X#]` for validation.
 ---
 ### 10. API Mapping Plan
 - **`reset(seed)`**  
  - Sets initial empty `board`, `turn_index=0`, and seeds RNG for determinism.  
  - Returns initial observations (`"Navigator Alpha"`, `"Navigator Beta"`).  
 - **`step(player_action)`**  
  - Extracts action token with `_extract_answer_content`.  
  - Validates syntax and target cell availability.  
  - Updates `board`, appends to `move_history`, increments `turn_index`.  
  - After update, executes terminal checks (victory or draw).  
  - Produces new observations describing updated board state.  
 - **`_generate_player_prompt(player_id)`**  
  - Compiles textual description of board, current scores, and open cells.  
  - Lists permitted `[Place: <cell_id>]` choices.  
  - Concludes with directive: *Put your final answer within \boxed{{}} at the end of your response.*
 All actions and resultant board states are deterministic given identical seeds and action sequences.
 ---
 ### 11. Copy‑Check Against the Example
 - The environment, terminology, and objective are **entirely original**.  
 - There is **no negotiation**, **no trading**, **no resource exchange**, and **no alignment with any bargaining mechanics** from the example environment.  
 - Entities (“Navigator Alpha/Beta,” “energy beacons,” “StarGrid”) and game state keys (`board`, `player_symbols`, `move_history`, etc.) are unique to this design.  
 - The theme is cosmic grid conquest, **not** any prior example domain.
 ---
 **End of Design Document – “StarGrid Duel”**