diff --git a/environment.md b/environment.md new file mode 100644 index 0000000..db195e7 --- /dev/null +++ b/environment.md @@ -0,0 +1,255 @@ +# Turn-Based TextArena Design Document: **"GlyphGrid Duel"** + +*(Design document for a deterministic, turn-based environment inspired by tic-tac-toe mechanics, but set in a completely original setting, terminology, and data schema.)* + +--- + +## 1. Concept Paragraph + +**Game Title:** *GlyphGrid Duel* + +In the ancient halls of the Archivists, two rival Scribes compete to inscribe mystical glyphs into a sacred 3×3 grid called the “Runeboard.” Each Scribe alternates turns to etch one of their signature glyphs—**Solar** (`S`) or **Lunar** (`L`)—into an empty rune slot. The goal is to align three of one’s glyphs consecutively across a row, column, or diagonal, representing mastery of the grid’s equilibrium energies. Although the core structure echoes a placement strategy game, *GlyphGrid Duel* is **unrelated to any negotiation, trade, or dialogue-based environment**. It focuses solely on deterministic pattern control, tactical foresight, and spatial reasoning. + +--- + +## 2. Roles and Win Condition + +- **Players:** Two players: + - **Scribe Solar** — uses the glyph `"S"`. + - **Scribe Lunar** — uses the glyph `"L"`. + +- **Objective:** Align three identical glyphs in a straight line across the 3×3 Runeboard. + +- **Win Condition:** + - A player wins immediately upon creating a line (horizontal, vertical, or diagonal) consisting of their own glyphs. + - If all cells are filled and no player has a line, the result is a **Draw**. + +- **Loss Condition:** + - A player loses if the opponent achieves a winning alignment first. + +- **Draw Condition:** + - The Runeboard is full, and no completed line exists. + +--- + +## 3. Turn Structure and Determinism + +- The game alternates turns between Scribe Solar (first) and Scribe Lunar (second). +- Each turn consists of one valid placement action onto an empty cell. +- **Turn Limit:** 9; the grid contains 9 total rune slots. +- **Determinism:** + - No random factors after initialization. + - A fixed random seed controls any *starting player choice* (though Solar always starts by default) and can reproduce identical outcomes when applied in `reset(seed=x)`. + +--- + +## 4. Action Grammar (Machine-Parseable) + +### Valid Actions + +Each action specifies a cell position in row-column format, using **1-based indexing**. + +**Format:** +``` +[Etch: , ] +``` + +- `` and `` are integers in `{1, 2, 3}`. +- The cell at `(row, column)` must be unoccupied. + +**Regex pattern:** +``` +^\[Etch:\s*([1-3]),\s*([1-3])\]$ +``` + +### Examples: + +| Example Action | Valid? | Reason | +|----------------|--------|--------| +| `[Etch: 1, 3]` | ✅ | Valid coordinates. | +| `[Etch: 3, 1]` | ✅ | Valid coordinates. | +| `[Etch: 4, 2]` | ❌ | Row = 4 out of bounds. | +| `[Etch (2,2)]` | ❌ | Invalid token format (missing colon and brackets). | +| `[Mark: 1, 1]` | ❌ | Invalid verb token; must use “Etch”. | + +All player responses must be wrapped in `\boxed{{}}` during gameplay, e.g. +`\boxed{{[Etch: 2, 1]}}`. + +--- + +## 5. Game State Schema + +Example `game_state` at runtime (illustrative values only): + +```json +{ + "runeboard": [ + ["S", "L", "_"], + [ "_", "S", "_"], + ["L", "_", "L"] + ], + "current_player": "Solar", + "turn_count": 5, + "winner": null, + "is_terminal": false, + "last_action": "[Etch: 3, 3]", + "observations": { + "Solar": [ + "Runeboard state after turn 4...", + "Lunar etched at (3,1)" + ], + "Lunar": [ + "Runeboard state after turn 4...", + "Lunar etched at (3,1)" + ] + }, + "player_symbols": { + "Solar": "S", + "Lunar": "L" + }, + "seed": 42 +} +``` + +Keys: +- `runeboard` — Nested list of strings (`"S"`, `"L"`, or `"_"` for empty). +- `current_player` — Indicates whose turn it is. +- `turn_count` — Number of turns completed. +- `winner` — `"Solar"`, `"Lunar"`, or `null`. +- `is_terminal` — Boolean indicating game completion. +- `last_action` — Last validated `[Etch: r, c]`. +- `observations` — Per-player transcript and board updates. +- `player_symbols` — Maps each player to their glyph. +- `seed` — Ensures deterministic reproducibility. + +--- + +## 6. Initialization Rules + +- On `reset(seed)`, seed is recorded in `game_state["seed"]`. +- Starting player defaults to **Solar** unless a rule toggle changes it (seed-dependent optional). +- Board resets to all empty (`"_"`). +- `turn_count = 0`, `winner = null`, `is_terminal = false`. +- Initial observation describes the empty Runeboard: + ``` + The Runeboard is empty. Each Scribe may etch a glyph using [Etch: row, col]. + ``` +- All randomness (if ever expanded, e.g. random first player) must derive solely from the seed. + +--- + +## 7. Validation and Error Handling + +When extracting the inner content via `_extract_answer_content`, the environment validates: + +1. Regex pattern matches `^\[Etch:\s*([1-3]),\s*([1-3])\]$`. +2. Target cell must be empty (`"_"`). +3. Game must not be terminal. +4. The acting player must match `current_player`. + +**Invalid Move Reasons** (passed to `set_invalid_move`): +- `"Invalid format: must be [Etch: row, column] with row,col in 1–3."` +- `"Out of bounds: coordinates must be between 1 and 3."` +- `"Cell already occupied."` +- `"Game already ended."` +- `"Not your turn."` + +--- + +## 8. Terminal Conditions and Scoring + +At the end of each valid move: + +1. **Win Check:** + - If current player’s glyph forms any contiguous row, column, or diagonal of 3 identical glyphs, + → `winner = current_player`, `is_terminal = True`. + +2. **Draw Check:** + - If `turn_count == 9` and `winner == null`, + → `is_terminal = True`, result = **Draw**. + +3. **Scoring:** + - **Win:** +1 point to winner; 0 to loser. + - **Draw:** 0.5 to both. + +4. **Tie Break:** + - None; draws are final. + +--- + +## 9. Player Prompt Specification + +Each turn’s `_generate_player_prompt(player_id)` provides: + +1. **Identity Blurb:** + ``` + You are a Scribe competing to master the Runeboard through glyph alignment. + ``` +2. **Rules Summary:** + - Each player alternately etches one glyph per turn. + - Wins occur when three identical glyphs align (row, column, or diagonal). + - If all nine cells are filled without alignment, it’s a draw. +3. **Action Instructions:** + - Choose one empty cell and etch your glyph. + - Actions **must** follow the format `[Etch: row, column]`. + - Place your final choice inside `\boxed{{}}`. + +4. **Examples:** + ``` + Example valid response: + I will etch at the top right corner. + \boxed{{[Etch: 1, 3]}} + + Example invalid response: + \boxed{{[Mark: 1, 3]}} # Reason: "Mark" is not a valid action. + ``` + +5. **Information Provided Each Turn:** + - Current Runeboard state. + - Move history (summarized from observations). + - Which coordinates are still empty. + +--- + +## 10. API Mapping Plan + +### **reset(seed)** + +- Initializes `game_state` with all keys defined above. +- Creates empty Runeboard and resets counters. +- Sets seed for deterministic reproduction. +- Returns initial observation for both players describing the empty grid. + +### **step(player_id, action)** + +- Extracts `content = _extract_answer_content(action)`. +- Validates format and move legality. +- Updates `runeboard`, `turn_count`, `current_player`. +- Checks for terminal condition (win or draw). +- Records action in both players’ `observations` list. +- Returns updated `game_state`, per-player observations, reward signals, and termination status. + +### **_generate_player_prompt(player_id)** + +- Produces textual prompt combining: + - Role context (Solar/Lunar) + - Current Runeboard depiction + - Legal moves list in `[Etch: r, c]` format + - Reminder of boxed answer format and examples +- Enforces output rule: + `"Put your final answer within \boxed{{}} at the end of your response."` + +--- + +## 11. Copy-Check Against the Example + +This design: +- **Does not** reference or replicate the negotiation example’s mechanics, dialogue, or resources. +- Uses **completely distinct terminology**: Scribes, Glyphs, Runeboard, Etching. +- Involves **no negotiation, trade, or communication** mechanics. +- Defines an objective (line alignment) wholly original to this document. +- All `game_state` keys (`runeboard`, `current_player`, `player_symbols`, etc.) and prompts are original to *GlyphGrid Duel*. + +--- + +**End of Design Document for “GlyphGrid Duel.”** \ No newline at end of file