test-v0/environment.md

# **Game Design Document: “Orbital Align” (Deterministic Turn-Based Strategy Inspired by Tic-Tac-Toe)**

---

## 1. Concept Paragraph

**Setting & Theme:**
In *Orbital Align*, two rival star captains compete to align their fleets of orbital satellites across a 3×3 planetary grid suspended around a dying star. Unlike classic tic-tac-toe, this version reimagines the board as orbital nodes where each satellite placement represents a strategic claim of spatial control. The goal is to align three satellites in a row—horizontally, vertically, or diagonally—before the opponent does.
**Core action tokens:**
`[Deploy:x,y]` (to place a satellite on coordinates), and `[Scan]` (forfeit placement to reveal the current grid state).
This design is *completely unrelated* to any previous negotiation or resource trading example. It uses a new setting, terminology, and objectives.

---

## 2. Roles and Win Condition

**Roles:**
- **Player A (Commander Solis)** and **Player B (Commander Nyx)** each command a distinct orbital fleet.
- Each player’s satellite is marked distinctly (`S` for Solis, `N` for Nyx`).

**Win Condition:**
- A player wins if they align **three of their satellites** consecutively in any row, column, or diagonal.
- If all nine grid cells are filled without a winning alignment, the result is a **draw**.

**Loss Condition:**
- A player loses if the opponent achieves an alignment before them.
- A player also loses immediately if they perform an **invalid action** that cannot be corrected within the same turn.

---

## 3. Turn Structure and Determinism

- The game progresses **alternating turns**, starting with Commander Solis (Player A).
- **Each turn**: Current player chooses one action (`Deploy` or `Scan`).
- Maximum **turn limit**: 9 (the grid has 9 total cells).
- The environment uses a reproducible **random seed**—though this game itself has no stochastic actions, seeding ensures deterministic ordering if future extensions add random elements.

---

## 4. Action Grammar (Machine-Parseable)

**Permitted Action Tokens**

| Action | Meaning | Formal Regex | Example Valid | Example Invalid | Reason Invalid |
|:--|:--|:--|:--|:--|:--|
| `[Deploy:x,y]` | Place a satellite at coordinates (x,y) where x,y ∈ {1,2,3} | `^\[Deploy:(?:[1-3]),(?:[1-3])\]$` | `[Deploy:2,3]` | `[Deploy:4,1]` | 4 outside valid range |
| `[Scan]` | View the current orbital grid instead of placing | `^\[Scan\]$` | `[Scan]` | `[ScanGrid]` | Incorrect token name |

**Rules:**
- Coordinates (x,y) correspond to the grid: (1,1) = top-left, (3,3) = bottom-right.
- No double occupation allowed—if a player tries to `Deploy` on an occupied node, it is invalid.

---

## 5. Game State Schema

Example serialized game state:

```json
{
  "turn_count": 5,
  "current_player": "Commander Solis",
  "board": [
    ["S", "N", " "],
    [" ", "S", " "],
    ["N", " ", " "]
  ],
  "players": {
    "Commander Solis": {
      "symbol": "S",
      "actions_taken": ["[Deploy:1,1]", "[Deploy:2,2]", "[Deploy:3,1]"]
    },
    "Commander Nyx": {
      "symbol": "N",
      "actions_taken": ["[Deploy:1,2]", "[Deploy:3,1]"]
    }
  },
  "winner": null,
  "is_terminal": false,
  "last_action": "[Deploy:2,2]",
  "observation_log": [
    "Commander Solis deployed to 1,1",
    "Commander Nyx deployed to 1,2",
    "Commander Solis deployed to 2,2"
  ],
  "seed": 42
}
```

---

## 6. Initialization Rules

- **Board**: Empty 3×3 grid represented as a list of lists containing `" "`.
- **Starting player**: Commander Solis always starts.
- **Seeding**: Random seed (e.g., `seed=42`) stored in `game_state` for deterministic replay.
- **Onboarding observations**:
  Upon `reset`, each player receives:
  - The empty grid state.
  - Instructions on how to deploy satellites and when the game concludes.

---

## 7. Validation and Error Handling

**Validation checks in order:**
1. Verify that the extracted content matches one of the valid action patterns.
2. For `[Deploy:x,y]`, ensure:
   - x, y within range 1–3.
   - Target cell is empty.
3. For `[Scan]`, ensure no other content is appended.
4. If the regex or move legality fails, call
   `set_invalid_move(player, reason)`
   with one of:
   - `"Malformed action syntax"`
   - `"Coordinates out of range"`
   - `"Target cell occupied"`
   - `"Unrecognized action token"`

Action extraction must strip wrapping `\boxed{{...}}`, leaving only the internal content for validation.

---

## 8. Terminal Conditions and Scoring

**After each move**, the system checks:

1. **Win Check:**
   - Rows, columns, and diagonals scanned for `['S', 'S', 'S']` or `['N', 'N', 'N']`.
   - The corresponding player is marked `winner`.
2. **Draw Check:**
   - If `turn_count == 9` and no winner ⇒ `"DRAW"`.
3. **Score Rules:**
   - Winner = 1, Loser = 0.
   - In draw = 0.5 each.

Tie-breakers are deterministic—no randomness or hidden state.

---

## 9. Player Prompt Specification

**Prompt Outline:**

> **IDENTITY BLURB:**
> You are a star commander controlling a fleet of satellites orbiting a dying star. Your mission is to align three of your satellites in a row across the 3×3 orbital grid before your rival does.
>
> **CURRENT STATE:**
> - The board shows your placements (S) and your opponent’s (N).
> - Empty cells are blank spaces.
>
> **AVAILABLE ACTIONS:**
> - `[Deploy:x,y]` → Place your satellite at coordinates (x,y) where x,y ∈ {1,2,3}.
> - `[Scan]` → Forfeit placement this turn to inspect the full orbital map.
>
> **FORMAT RULES:**
> - Each response must end with: `\boxed{{<action>}}`
> - Example of valid response:
>   ```
>   I will secure the top-right orbit next.
>   \boxed{{[Deploy:1,3]}}
>   ```
> - Example of invalid response:
>   ```
>   Let’s attack next time.
>   [Deploy:1,3]
>   ```
>   (Because it's missing `\boxed{{}}`.)
>
> **REMINDERS:**
> - You cannot deploy on an occupied orbit.
> - The game will end immediately if three satellites align or all nine orbits are filled.

All dialogue and moves are appended to the shared `observation_log`.

---

## 10. API Mapping Plan

| API Method | Purpose | Primary Read/Write | Terminal logic |
|-------------|----------|-------------------|----------------|
| `reset(seed)` | Initializes the grid, assigns symbols, clears logs, and sets starting player. | Writes entire `game_state`. | Returns initial observation and seed confirmation. |
| `step(action)` | Validates player’s boxed action, updates the grid/state, switches turns. | Reads `current_player`, `board`; writes updates, logs. | Runs win/draw checks after every move; sets `is_terminal`, `winner`. |
| `_generate_player_prompt(player)` | Builds textual prompt shown above, embedding the latest board and prior logs. | Reads from `board`, `observation_log`, and `current_player`. | Does not modify state; only generates text. |

On invalid actions, `step` calls `set_invalid_move(reason)` and forces a retry or ends the game if hopeless.

---

## 11. Copy-Check Against Example

All entity names (**Commander Solis**, **Commander Nyx**, **satellites**, **orbital grid**) and thematic terms are **original** and unrelated to any example negotiation or deal-making scenario. The game’s objective (aligning satellites on a 3×3 grid) derives from *tic-tac-toe mechanics* but expressed in a wholly new narrative context.
All `game_state` keys (`board`, `winner`, `observation_log`, `symbol`, etc.) are unique to *Orbital Align*, and none are borrowed from any trading, diplomacy, or economic system.