blahblahblah/environment.md

# GAME DESIGN DOCUMENT — **"StarGrid Duel"**

---

### 1. Concept Paragraph
**StarGrid Duel** is a deterministic, turn-based strategy game inspired by the simplicity of grid conquest, but it is **not** tic‑tac‑toe. Two rival star‑navigators take turns deploying *energy beacons* on a 3×3 stellar grid. Their aim is to align three of their own beacons in a straight line of cosmic power (horizontal, vertical, or diagonal) before the opponent does, or to fill the grid entirely for a balanced standoff. Players will issue commands like `[Place: A2]` to deposit a beacon on a coordinate. The environment is purely deterministic: no randomness or negotiation mechanics are involved. The game’s purpose is to measure spatial foresight and terminal pattern recognition—completely unrelated to any negotiation or resource trading examples.

---

### 2. Roles and Win Condition
- **Roles**
  - **Player A ("Navigator Alpha")**: Uses energy color **Blue**.
  - **Player B ("Navigator Beta")**: Uses energy color **Crimson**.

- **Objective**
  Be the first navigator to align three of your beacons continuously (row, column, or diagonal) on the 3×3 StarGrid.

- **Win Rule**
  - A player **wins** immediately upon forming a line of three of their own symbols.
  - The game is a **draw** if all nine cells are filled without a three‑in‑a‑line configuration.
  - Upon win or draw, the game enters a terminal state and no further actions are accepted.

---

### 3. Turn Structure and Determinism
- Players alternate turns beginning with Player A at turn index `0`.
- Each turn is atomic: exactly one action is taken.
- A deterministic seed ensures that initialization and any potential random ordering (none required here, but included for reproducibility) follow identical patterns.
- The turn counter increments after each valid action. Once nine valid turns have been processed or a win condition is met, the environment halts.

---

### 4. Action Grammar (Machine‑Parsable)
Players specify grid placement commands targeting one unused cell.

**Allowed Actions**
```
[Place: <cell_id>]
```

**Cell IDs**
Valid values: `A1, A2, A3, B1, B2, B3, C1, C2, C3` (Rows A–C, Columns 1–3)

**Formal Pattern (Regex)**
`^\[Place:\s*(A|B|C)(1|2|3)\]$`

**Examples**
- **Valid:** `[Place: B2]` → Places player’s beacon in the center cell.
- **Invalid Examples:**
  - `[place: B2]` → Invalid capitalization and token name.
  - `[Place: D1]` → `D1` not in allowed grid range.
  - `[Deploy: A1]` → Invalid action token.
  - `[Place: B2 extra]` → Extra text violates strict grammar.

All player outputs later will be wrapped in `\boxed{{…}}`. The implementation will extract the internal `[Place: X#]` command to validate according to the above pattern.

---

### 5. Game State Schema
```json
{
  "turn_index": 4,
  "active_player": "B",
  "board": {
    "A1": "Blue",
    "A2": null,
    "A3": "Crimson",
    "B1": "Blue",
    "B2": "Crimson",
    "B3": null,
    "C1": null,
    "C2": null,
    "C3": null
  },
  "player_symbols": {
    "A": "Blue",
    "B": "Crimson"
  },
  "move_history": [
    {"player": "A", "action": "[Place: A1]"},
    {"player": "B", "action": "[Place: A3]"},
    {"player": "A", "action": "[Place: B1]"},
    {"player": "B", "action": "[Place: B2]"}
  ],
  "winner": null,
  "is_draw": false,
  "observations": {
    "A": "Text transcript of latest game state for Alpha",
    "B": "Text transcript of latest game state for Beta"
  },
  "seed": 42
}
```

---

### 6. Initialization Rules
- `reset(seed)` initializes an empty 3×3 board with all cells `null`.
- The turn index resets to `0` with `active_player = "A"`.
- The same seed always ensures that turn order, board labeling, and any deterministic tie logic behave identically.
- Both players receive an onboarding observation describing:
  - Empty StarGrid layout
  - Their color and symbol
  - Instructions and the legal action syntax

---

### 7. Validation and Error Handling
- Upon receiving a player move, extract the content inside `\boxed{{}}` using `_extract_answer_content`.
- Validate against the regex `^\[Place:\s*(A|B|C)(1|2|3)\]$`.
- Check that the specified cell is unoccupied.
- **Invalid Move Reasons**
  - `"MalformedAction"`: Does not match required pattern.
  - `"CellOutOfRange"`: Coordinate not part of StarGrid labels.
  - `"CellOccupied"`: Target cell already taken.
  - `"NotYourTurn"`: Attempt to act out of sequence after loss or between turns.
  The environment calls `set_invalid_move(reason)` with a human-readable reason, retaining determinism (the turn is forfeited or handled as draw according to policy).

---

### 8. Terminal Conditions and Scoring
**Checks each turn immediately after placing a valid beacon:**
1. **Victory Check** – If the current player’s beacons form any of the eight winning line patterns, set `winner = active_player`, terminate game.
2. **Draw Check** – If no empty cells remain and no winner exists, set `is_draw = true`.
3. **Scoring** –
   - Win: `+1` score for winner, `0` for loser.
   - Draw: `0.5` each as tie credit (for potential series mode).

**Tie‑Break Procedure**
If multiple win conditions appear simultaneously (impossible under normal rules), the first detected alignment pattern is applied deterministically.

---

### 9. Player Prompt Specification
Each player receives a structured prompt reflecting the current board and legal moves.

**Prompt Outline**

> **Identity Blurb:**
> You are a star navigator placing energy beacons on a galactic grid. Each cell you claim radiates your color’s energy. The goal is to align three of your beacons in a line before the opponent.

> **Current Board State:**
> - Display a 3×3 grid with coordinates and current occupancy.

> **Your Color:** Blue or Crimson
> **Turn Information:** Which player moves next (`Navigator Alpha` or `Navigator Beta`)

> **Allowed Actions:**
> Format: `[Place: <cell_id>]`, where `<cell_id>` ∈ {A1,…,C3} and the cell must be empty.
> You must wrap your selected action inside `\boxed{{}}` at the end of your message.

> **Response Format:**
> You may reason about your move, then output your final choice within `\boxed{{}}`.

**Few‑Shot Examples**

```
Example valid response:
I will claim the center of the grid to control diagonals.
\boxed{{[Place: B2]}}

Example invalid response:
I think I'll move now.
\boxed{{[Move: B2]}}  ← "Move" not a valid token.
```

The function `_extract_answer_content(self, action: str) -> str` will remove `\boxed{{}}` wrappers and yield `[Place: X#]` for validation.

---

### 10. API Mapping Plan

- **`reset(seed)`**
  - Sets initial empty `board`, `turn_index=0`, and seeds RNG for determinism.
  - Returns initial observations (`"Navigator Alpha"`, `"Navigator Beta"`).

- **`step(player_action)`**
  - Extracts action token with `_extract_answer_content`.
  - Validates syntax and target cell availability.
  - Updates `board`, appends to `move_history`, increments `turn_index`.
  - After update, executes terminal checks (victory or draw).
  - Produces new observations describing updated board state.

- **`_generate_player_prompt(player_id)`**
  - Compiles textual description of board, current scores, and open cells.
  - Lists permitted `[Place: <cell_id>]` choices.
  - Concludes with directive: *Put your final answer within \boxed{{}} at the end of your response.*

All actions and resultant board states are deterministic given identical seeds and action sequences.

---

### 11. Copy‑Check Against the Example
- The environment, terminology, and objective are **entirely original**.
- There is **no negotiation**, **no trading**, **no resource exchange**, and **no alignment with any bargaining mechanics** from the example environment.
- Entities (“Navigator Alpha/Beta,” “energy beacons,” “StarGrid”) and game state keys (`board`, `player_symbols`, `move_history`, etc.) are unique to this design.
- The theme is cosmic grid conquest, **not** any prior example domain.

---

**End of Design Document – “StarGrid Duel”**