diff --git a/environment.md b/environment.md new file mode 100644 index 0000000..6473649 --- /dev/null +++ b/environment.md @@ -0,0 +1,223 @@ +--- + +# **TIC-TAC-TRAIL: A Turn-Based Strategy Design Document** + +--- + +## **1. Concept Paragraph** + +**Concept Overview:** +*Tic-Tac-Trail* is a deterministic, turn-based tactical puzzle inspired by grid conquest—completely unrelated to negotiation or trade mechanics. Two explorers, **Team Sun** and **Team Moon**, compete to claim paths on an ancient 3×3 stone map. Each tile can be marked with their emblem (`Sun` or `Moon`). The first expedition to align three of their emblems in a continuous line (horizontal, vertical, or diagonal) awakens the temple’s power and wins. Core player commands are expressed as `[Mark:,]`, describing which grid position to claim, or `[Pass]` if no legal move remains. The environment tracks placement, board state, turn order, and victory conditions deterministically. + +--- + +## **2. Roles and Win Condition** + +- **Players:** + - Player 1: *Team Sun* (symbol “S”) + - Player 2: *Team Moon* (symbol “M”) + +- **Objective:** + Align three of one’s symbols (`S` or `M`) in a straight line (row, column, or diagonal) before the board fills. + +- **Decision Rules:** + - **Win:** First player to form an unbroken trio of their own emblem. + - **Loss:** Opponent achieves a trio first. + - **Draw:** All nine tiles filled without a winning alignment. + - Once a win or draw occurs, the game becomes terminal and no further moves are accepted. + +--- + +## **3. Turn Structure and Determinism** + +- The game alternates turns strictly: *Sun → Moon → Sun → Moon*, and so on. +- Turn count begins at 1 and increments after each valid action. +- Maximum of nine turns (since there are nine cells). +- No random factors exist; the game is **fully deterministic**. +- Seed value (for reproducibility) is still stored in state, but unused—ensuring consistent replay. + +--- + +## **4. Action Grammar (Machine-Parseable)** + +**Permitted Actions:** + +### 4.1 Mark a Tile +- **Token Format:** `[Mark:,]` +- **Pattern (regex):** `^\[Mark:(0|1|2),(0|1|2)\]$` +- **Semantics:** Current player places their symbol on the specified cell `(row, col)` if it’s empty. + +**Examples:** +- ✅ **Valid:** `[Mark:0,2]` — Player marks top-right cell. +- ❌ **Invalid:** `[Mark:3,1]` — Row "3" out of range (valid rows: 0–2). +- ❌ **Invalid:** `[Mark:1-2]` — Comma separator or keyword missing. + +### 4.2 Pass +- **Token Format:** `[Pass]` +- **Pattern (regex):** `^\[Pass\]$` +- **Semantics:** Used only if the player has no valid cell remaining (rare in tic-tac-toe). + +**Examples:** +- ✅ **Valid:** `[Pass]` — Player skips turn. +- ❌ **Invalid:** `[PASS]` — Case-sensitive token must match exactly `[Pass]`. + +--- + +## **5. Game State Schema** + +```json +{ + "seed": 42, + "turn_count": 1, + "current_player": "Sun", + "board_state": [ + ["_", "_", "_"], + ["_", "_", "_"], + ["_", "_", "_"] + ], + "player_symbols": { + "Sun": "S", + "Moon": "M" + }, + "history": [ + {"player": "System", "message": "The ancient board awaits."} + ], + "winner": null, + "status": "ongoing", + "available_moves": [ + [0, 0], [0, 1], [0, 2], + [1, 0], [1, 1], [1, 2], + [2, 0], [2, 1], [2, 2] + ], + "scores": { + "Sun": 0, + "Moon": 0 + } +} +``` + +- Keys reflect a unique thematic world: the ancient “trail” board, emblems for “Sun” and “Moon,” and clear distinction from any negotiation-like schema. + +--- + +## **6. Initialization Rules** + +- When `reset(seed)` is called: + 1. The RNG is seeded (though unused for determinism) using `seed`. + 2. The `board_state` is filled with `_` symbols representing empty stone tiles. + 3. The first turn is always `Sun`. + 4. The `history` log begins with a world description. + 5. `available_moves` includes all `(row, col)` pairs. +- Observation: Both players receive identical initial description and empty board visualization. + +--- + +## **7. Validation and Error Handling** + +- **Extraction:** + The environment will extract content from within `\boxed{{}}` using `_extract_answer_content(action)`. + +- **Validation Steps:** + 1. Verify action string matches one of the two regex patterns. + 2. If `[Mark:,]`, check: + - 0 ≤ r,c ≤ 2 + - Corresponding cell is unoccupied (`"_"`). + 3. If `[Pass]`, ensure no playable cells remain; otherwise invalid. + +- **Invalid Reasons (examples):** + - "Invalid format — must be [Mark:r,c] or [Pass]." + - "Chosen cell already occupied." + - "Row or column index out of range." + - "Cannot pass while moves still available." + +If invalid, the system invokes `set_invalid_move(reason)` and forfeit logic may apply depending on higher-level controller. + +--- + +## **8. Terminal Conditions and Scoring** + +**Checks performed after each valid move:** + +1. **Win Check:** + If the current player owns three symbols aligned horizontally, vertically, or diagonally: + - `winner = current_player` + - `status = "finished"` + - `scores[current_player] = 1` + - Opponent receives 0. + +2. **Draw Check:** + If all cells filled and no winner: + - `winner = null` + - `status = "draw"` + - Both scores = 0.5. + +3. **Continue Otherwise:** + - `status = "ongoing"` + - Proceed to next player. + +**Tie-Break:** +None beyond declared draw; equal scoring applies. + +--- + +## **9. Player Prompt Specification** + +**Prompt Identity and Instructions:** + +Each turn’s prompt should contain: + +1. A brief world intro: + “You are an explorer representing Team Sun (or Team Moon) claiming tiles on the ancient Tic-Tac-Trail.” +2. The current board visualization (3×3 grid of `_`, `S`, `M`). +3. The list of allowed action formats: + - `[Mark:,]` where `` and `` are integers 0–2. + - `[Pass]` if no unclaimed tiles remain. +4. Reminder of victory condition: “Align three of your emblems in a straight line.” +5. Rule reminder: “All actions must be enclosed in `\boxed{{}}` at the end of your message.” + +**Few-shot examples:** + +``` +Example valid response: +I should take the center stone before my rival. +\boxed{{[Mark:1,1]}} +``` + +``` +Example invalid response (wrong format): +\boxed{{Mark:1,1}} <-- Missing brackets [ ] +``` + +``` +Example valid response (board full, passing): +No moves left, I will pass. +\boxed{{[Pass]}} +``` + +**Extraction Function Notice:** +`_extract_answer_content(self, action: str) -> str` will strip `\boxed{{}}` syntax and return internal content for validation. + +--- + +## **10. API Mapping Plan** + +| Method | Purpose | Operations on Game State | Output | +|--------|----------|--------------------------|--------| +| **`reset(seed)`** | Initialize the game | Sets all keys per schema, seed board, assign first player (`Sun`), populate `available_moves`, generate initial system message | Returns initial `observations` for both players | +| **`step(player_action)`** | Process one player's move | 1. Extract content with `_extract_answer_content` 2. Validate grammar & legality 3. If valid, apply to `board_state` 4. Append to `history` 5. Update `available_moves` 6. Check win/draw conditions, adjust scores, and advance turn | Returns updated `observations`, reward info, `done` flag | +| **`_generate_player_prompt(player_id)`** | Builds textual context for that player | Uses the current `board_state`, `turn_count`, and list of legal actions. Demonstrates correct formatting. | Returns formatted prompt string instructing the player to end with a `\boxed{{}}` action | + +--- + +## **11. Copy-Check Against the Example** + +This design is **fully distinct** from any negotiation or resource-trading environment. +- **Theme:** Archaeological puzzle arena (grid conquest), not negotiation. +- **Objectives:** Claim territory and form a line, not reach mutual agreements. +- **Entities:** Ancient stones, Sun and Moon symbols—not participants in a deal. +- **Game State Keys:** `board_state`, `player_symbols`, `available_moves`, and `scores`—entirely original. +- **Prompt Text:** References *Tic-Tac-Trail* and ancient exploration, not disputes or offers. + +Therefore, this specification represents a fully self-contained original turn-based environment for a deterministic **tic-tac-toe–style strategy challenge**, compliant with TextArena architecture. + +--- \ No newline at end of file