diff --git a/environment.md b/environment.md new file mode 100644 index 0000000..ca83b7b --- /dev/null +++ b/environment.md @@ -0,0 +1,230 @@ +# **Game Design Document: “Crystal Grid”** + +--- + +## 1. Concept Paragraph + +“**Crystal Grid**” is a deterministic, turn-based strategy duel in which two mystic architects compete to align charged crystals on a sacred 3×3 energy grid. Each player channels a unique elemental charge—Solar (Player 1) or Lunar (Player 2)—and infuses crystals into empty nodes. The goal is to align three of one’s crystals horizontally, vertically, or diagonally to create a stable energy conduit. The available actions consist solely of placing a crystal at a chosen grid coordinate. This design is entirely unrelated to any negotiation or trading game; it’s a pure positional alignment challenge grounded in an original magical circuit theme. + +--- + +## 2. Roles and Win Condition + +- **Players**: + - Player 1: **Solar Architect** (mark → `S`) + - Player 2: **Lunar Architect** (mark → `L`) + +- **Objective**: Align three of one’s crystals (`S` or `L`) in a row, column, or diagonal. + +- **Win Condition**: + - A player **wins** if after their turn, three of their marks form an unbroken line. + - The opposing player **loses** immediately. + - If all nine nodes are filled and no line is formed, the outcome is a **draw**. + +Decision rules: +1. Check alignment after every placement. +2. If both align simultaneously (impossible under alternation), the current mover’s alignment takes precedence. +3. Draw declared if maximum turns (9) are reached without a win. + +--- + +## 3. Turn Structure and Determinism + +- Players alternate strictly (`Solar → Lunar → Solar → ...`). +- The first mover (Solar) always begins. +- The environment is **fully deterministic**. +- Random seeding affects only which player may go first if configured optionally; otherwise fixed for reproducibility via explicit seed. +- Turn limit: **9** turns maximum. +- All outcomes reproducible with identical initial seed and move sequence. + +--- + +## 4. Action Grammar (Machine-Parseable) + +**Allowed Action Tokens**: +Each player may perform: + +``` +[Place: ,] +``` + +where: +- `` and `` ∈ `{1,2,3}` +- Positions are 1-indexed (top-left is 1,1; bottom-right is 3,3) +- No repetition; cannot place where a crystal already exists. + +**Formal Pattern (regex)**: +``` +^\[Place:\s*([1-3]),\s*([1-3])\]$ +``` + +**Example valid action**: +``` +[Place: 2,3] +``` +Reason valid: Correct token keyword, valid 1–3 coordinates, comma-separated. + +**Example invalid action #1**: +``` +[Place: 0,3] +``` +Reason invalid: Row out of allowed range (must be 1–3). + +**Example invalid action #2**: +``` +[Play: 2,3] +``` +Reason invalid: Token keyword incorrect (‘Play’ not recognized). + +In actual play, players will respond with `\boxed{{[Place: 2,3]}}` formatting, but the validation operates on the unboxed content. + +--- + +## 5. Game State Schema + +Example runtime structure: + +```json +{ + "turn_count": 4, + "current_player": "Solar", + "grid": [ + ["S", "L", null], + [null, "S", null], + [null, null, "L"] + ], + "available_cells": [[1,3],[2,1],[2,3],[3,1],[3,2]], + "winner": null, + "is_terminal": false, + "observations": { + "Solar": "Previous move: [Place: 1,2] by Lunar.", + "Lunar": "Your opponent placed [Place: 1,2]." + }, + "history": [ + "Solar → [Place: 1,1]", + "Lunar → [Place: 1,2]", + "Solar → [Place: 2,2]", + "Lunar → [Place: 3,3]" + ], + "seed": 42, + "score": { + "Solar": 0, + "Lunar": 0 + } +} +``` + +--- + +## 6. Initialization Rules + +- The grid initializes to all `null` (empty). +- `current_player` set to `"Solar"` unless overridden by seed-based first mover rule. +- `winner` set to `null`, `turn_count = 0`. +- The `seed` controls any shuffle variation for first turn but not the grid generation. +- Initial observations describe the empty grid and player symbols: + + ``` + The Crystal Grid is empty. You are Solar Architect (symbol ‘S’). + Your charge begins first. + ``` + +--- + +## 7. Validation and Error Handling + +**Invalid move triggers**: +1. **Pattern mismatch** — doesn’t match regex → “Action format not recognized.” +2. **Cell out of range** — row/col outside 1–3 → “Coordinates must be between 1 and 3.” +3. **Cell occupied** — attempted placement on non-null grid cell → “That node already holds a crystal.” +4. **Not your turn** — action attempted by the wrong player → “It is not your turn.” + +When detected, environment returns `set_invalid_move(player_id, reason)`. + +--- + +## 8. Terminal Conditions and Scoring + +**Check each turn**: +1. After placement, verify if that player occupies any of the 8 winning lines: + - Rows: (1,2,3)*(each same row) + - Columns: (1,2,3)*(each same column) + - Diagonals: `(1,1)-(2,2)-(3,3)` and `(1,3)-(2,2)-(3,1)` +2. If true → `is_terminal = True`, `winner = current_player`. +3. If `turn_count == 9` and no winner → `is_terminal = True`, `winner = "draw"`. + +**Scoring convention**: +- Win → +1 score for winner, 0 for loser. +- Draw → 0.5 each. + +**Tie-break**: +- None required beyond draw detection. + +--- + +## 9. Player Prompt Specification + +When `_generate_player_prompt` runs, it assembles a structured text like below: + +**Prompt Outline:** +- Identity blurb: + “You are a mystic architect competing on the Crystal Grid. Align three of your charged crystals before your opponent does.” +- Display current board in a human-readable format with coordinates. +- List allowed actions summary: + ``` + Allowed Action: [Place: row,col] + where row and col are integers in {1,2,3}. + ``` +- Explain win conditions and turn rules. +- Example response instructions, including final boxed action. + +**Example snippets shown in prompt:** +``` +Example valid response: +I will charge the central node for structural balance. +\boxed{{[Place: 2,2]}} +``` + +If an invalid attempt occurs: +``` +Invalid example (do not use): +\boxed{{[Play: 2,2]}} <-- token must be [Place: ...] +``` + +**Response format reminder**: +“At the end of your message, put your final answer within \boxed{{}} using one allowed action.” + +--- + +## 10. API Mapping Plan + +**reset()** +- Clears grid and logs, initializes player identifiers and seed. +- Returns initial `game_state`, first `observations`, and references to legal actions. +- No winner or score yet. + +**step(player_action)** +- Extracts action via `_extract_answer_content()`. +- Validates syntax and legality. +- If valid: updates grid, increments turn count, checks for winner/draw. +- Updates `observations` to include transcript of both players’ actions. +- Returns updated `game_state`, new observation for next player, and completion flag. + +**_generate_player_prompt(player_id)** +- Constructs full textual prompt described above with player perspective. +- Reads `game_state`: grid, turn count, player symbols, etc. +- Inserts few-shot examples of valid/invalid moves. +- Ends with explicit directive to produce `\boxed{{[Place: ,]}}`. + +--- + +## 11. Copy-Check Against the Example + +All entities, vocabulary, and schema in “**Crystal Grid**” are fully original: +- Theme: mystical crystal alignment, not trading, negotiation, or pricing. +- Roles: Solar and Lunar Architects, not merchants or diplomats. +- Objective: spatial pattern alignment, not bargaining success. +- Keys (`grid`, `available_cells`, `seed`, `history`, etc.) are unique to this design. +- Actions: `[Place: row,col]` tokens, entirely distinct from any negotiation-style `[Offer]` or `[Accept]`. + +This confirms full conceptual independence from the previous example while preserving structure needed for deterministic, turn-based API implementation. \ No newline at end of file