Files
create-a-tic-tac-toe-game/environment.md
2001-01-01 00:00:00 +00:00

223 lines
8.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
# **TIC-TAC-TRAIL: A Turn-Based Strategy Design Document**
---
## **1. Concept Paragraph**
**Concept Overview:**
*Tic-Tac-Trail* is a deterministic, turn-based tactical puzzle inspired by grid conquest—completely unrelated to negotiation or trade mechanics. Two explorers, **Team Sun** and **Team Moon**, compete to claim paths on an ancient 3×3 stone map. Each tile can be marked with their emblem (`Sun` or `Moon`). The first expedition to align three of their emblems in a continuous line (horizontal, vertical, or diagonal) awakens the temples power and wins. Core player commands are expressed as `[Mark:<row>,<col>]`, describing which grid position to claim, or `[Pass]` if no legal move remains. The environment tracks placement, board state, turn order, and victory conditions deterministically.
---
## **2. Roles and Win Condition**
- **Players:**
- Player 1: *Team Sun* (symbol “S”)
- Player 2: *Team Moon* (symbol “M”)
- **Objective:**
Align three of ones symbols (`S` or `M`) in a straight line (row, column, or diagonal) before the board fills.
- **Decision Rules:**
- **Win:** First player to form an unbroken trio of their own emblem.
- **Loss:** Opponent achieves a trio first.
- **Draw:** All nine tiles filled without a winning alignment.
- Once a win or draw occurs, the game becomes terminal and no further moves are accepted.
---
## **3. Turn Structure and Determinism**
- The game alternates turns strictly: *Sun → Moon → Sun → Moon*, and so on.
- Turn count begins at 1 and increments after each valid action.
- Maximum of nine turns (since there are nine cells).
- No random factors exist; the game is **fully deterministic**.
- Seed value (for reproducibility) is still stored in state, but unused—ensuring consistent replay.
---
## **4. Action Grammar (Machine-Parseable)**
**Permitted Actions:**
### 4.1 Mark a Tile
- **Token Format:** `[Mark:<row>,<col>]`
- **Pattern (regex):** `^\[Mark:(0|1|2),(0|1|2)\]$`
- **Semantics:** Current player places their symbol on the specified cell `(row, col)` if its empty.
**Examples:**
-**Valid:** `[Mark:0,2]` — Player marks top-right cell.
-**Invalid:** `[Mark:3,1]` — Row "3" out of range (valid rows: 02).
-**Invalid:** `[Mark:1-2]` — Comma separator or keyword missing.
### 4.2 Pass
- **Token Format:** `[Pass]`
- **Pattern (regex):** `^\[Pass\]$`
- **Semantics:** Used only if the player has no valid cell remaining (rare in tic-tac-toe).
**Examples:**
-**Valid:** `[Pass]` — Player skips turn.
-**Invalid:** `[PASS]` — Case-sensitive token must match exactly `[Pass]`.
---
## **5. Game State Schema**
```json
{
"seed": 42,
"turn_count": 1,
"current_player": "Sun",
"board_state": [
["_", "_", "_"],
["_", "_", "_"],
["_", "_", "_"]
],
"player_symbols": {
"Sun": "S",
"Moon": "M"
},
"history": [
{"player": "System", "message": "The ancient board awaits."}
],
"winner": null,
"status": "ongoing",
"available_moves": [
[0, 0], [0, 1], [0, 2],
[1, 0], [1, 1], [1, 2],
[2, 0], [2, 1], [2, 2]
],
"scores": {
"Sun": 0,
"Moon": 0
}
}
```
- Keys reflect a unique thematic world: the ancient “trail” board, emblems for “Sun” and “Moon,” and clear distinction from any negotiation-like schema.
---
## **6. Initialization Rules**
- When `reset(seed)` is called:
1. The RNG is seeded (though unused for determinism) using `seed`.
2. The `board_state` is filled with `_` symbols representing empty stone tiles.
3. The first turn is always `Sun`.
4. The `history` log begins with a world description.
5. `available_moves` includes all `(row, col)` pairs.
- Observation: Both players receive identical initial description and empty board visualization.
---
## **7. Validation and Error Handling**
- **Extraction:**
The environment will extract content from within `\boxed{{}}` using `_extract_answer_content(action)`.
- **Validation Steps:**
1. Verify action string matches one of the two regex patterns.
2. If `[Mark:<r>,<c>]`, check:
- 0 ≤ r,c ≤ 2
- Corresponding cell is unoccupied (`"_"`).
3. If `[Pass]`, ensure no playable cells remain; otherwise invalid.
- **Invalid Reasons (examples):**
- "Invalid format — must be [Mark:r,c] or [Pass]."
- "Chosen cell already occupied."
- "Row or column index out of range."
- "Cannot pass while moves still available."
If invalid, the system invokes `set_invalid_move(reason)` and forfeit logic may apply depending on higher-level controller.
---
## **8. Terminal Conditions and Scoring**
**Checks performed after each valid move:**
1. **Win Check:**
If the current player owns three symbols aligned horizontally, vertically, or diagonally:
- `winner = current_player`
- `status = "finished"`
- `scores[current_player] = 1`
- Opponent receives 0.
2. **Draw Check:**
If all cells filled and no winner:
- `winner = null`
- `status = "draw"`
- Both scores = 0.5.
3. **Continue Otherwise:**
- `status = "ongoing"`
- Proceed to next player.
**Tie-Break:**
None beyond declared draw; equal scoring applies.
---
## **9. Player Prompt Specification**
**Prompt Identity and Instructions:**
Each turns prompt should contain:
1. A brief world intro:
“You are an explorer representing Team Sun (or Team Moon) claiming tiles on the ancient Tic-Tac-Trail.”
2. The current board visualization (3×3 grid of `_`, `S`, `M`).
3. The list of allowed action formats:
- `[Mark:<row>,<col>]` where `<row>` and `<col>` are integers 02.
- `[Pass]` if no unclaimed tiles remain.
4. Reminder of victory condition: “Align three of your emblems in a straight line.”
5. Rule reminder: “All actions must be enclosed in `\boxed{{}}` at the end of your message.”
**Few-shot examples:**
```
Example valid response:
I should take the center stone before my rival.
\boxed{{[Mark:1,1]}}
```
```
Example invalid response (wrong format):
\boxed{{Mark:1,1}} <-- Missing brackets [ ]
```
```
Example valid response (board full, passing):
No moves left, I will pass.
\boxed{{[Pass]}}
```
**Extraction Function Notice:**
`_extract_answer_content(self, action: str) -> str` will strip `\boxed{{}}` syntax and return internal content for validation.
---
## **10. API Mapping Plan**
| Method | Purpose | Operations on Game State | Output |
|--------|----------|--------------------------|--------|
| **`reset(seed)`** | Initialize the game | Sets all keys per schema, seed board, assign first player (`Sun`), populate `available_moves`, generate initial system message | Returns initial `observations` for both players |
| **`step(player_action)`** | Process one player's move | 1. Extract content with `_extract_answer_content` 2. Validate grammar & legality 3. If valid, apply to `board_state` 4. Append to `history` 5. Update `available_moves` 6. Check win/draw conditions, adjust scores, and advance turn | Returns updated `observations`, reward info, `done` flag |
| **`_generate_player_prompt(player_id)`** | Builds textual context for that player | Uses the current `board_state`, `turn_count`, and list of legal actions. Demonstrates correct formatting. | Returns formatted prompt string instructing the player to end with a `\boxed{{}}` action |
---
## **11. Copy-Check Against the Example**
This design is **fully distinct** from any negotiation or resource-trading environment.
- **Theme:** Archaeological puzzle arena (grid conquest), not negotiation.
- **Objectives:** Claim territory and form a line, not reach mutual agreements.
- **Entities:** Ancient stones, Sun and Moon symbols—not participants in a deal.
- **Game State Keys:** `board_state`, `player_symbols`, `available_moves`, and `scores`—entirely original.
- **Prompt Text:** References *Tic-Tac-Trail* and ancient exploration, not disputes or offers.
Therefore, this specification represents a fully self-contained original turn-based environment for a deterministic **tic-tac-toestyle strategy challenge**, compliant with TextArena architecture.
---