8.0 KiB
1. Concept Paragraph
Concept Overview:
Tic-Tac-Trail is a deterministic, turn-based tactical puzzle inspired by grid conquest—completely unrelated to negotiation or trade mechanics. Two explorers, Team Sun and Team Moon, compete to claim paths on an ancient 3×3 stone map. Each tile can be marked with their emblem (Sun or Moon). The first expedition to align three of their emblems in a continuous line (horizontal, vertical, or diagonal) awakens the temple’s power and wins. Core player commands are expressed as [Mark:<row>,<col>], describing which grid position to claim, or [Pass] if no legal move remains. The environment tracks placement, board state, turn order, and victory conditions deterministically.
2. Roles and Win Condition
-
Players:
- Player 1: Team Sun (symbol “S”)
- Player 2: Team Moon (symbol “M”)
-
Objective:
Align three of one’s symbols (SorM) in a straight line (row, column, or diagonal) before the board fills. -
Decision Rules:
- Win: First player to form an unbroken trio of their own emblem.
- Loss: Opponent achieves a trio first.
- Draw: All nine tiles filled without a winning alignment.
- Once a win or draw occurs, the game becomes terminal and no further moves are accepted.
3. Turn Structure and Determinism
- The game alternates turns strictly: Sun → Moon → Sun → Moon, and so on.
- Turn count begins at 1 and increments after each valid action.
- Maximum of nine turns (since there are nine cells).
- No random factors exist; the game is fully deterministic.
- Seed value (for reproducibility) is still stored in state, but unused—ensuring consistent replay.
4. Action Grammar (Machine-Parseable)
Permitted Actions:
4.1 Mark a Tile
- Token Format:
[Mark:<row>,<col>] - Pattern (regex):
^\[Mark:(0|1|2),(0|1|2)\]$ - Semantics: Current player places their symbol on the specified cell
(row, col)if it’s empty.
Examples:
- ✅ Valid:
[Mark:0,2]— Player marks top-right cell. - ❌ Invalid:
[Mark:3,1]— Row "3" out of range (valid rows: 0–2). - ❌ Invalid:
[Mark:1-2]— Comma separator or keyword missing.
4.2 Pass
- Token Format:
[Pass] - Pattern (regex):
^\[Pass\]$ - Semantics: Used only if the player has no valid cell remaining (rare in tic-tac-toe).
Examples:
- ✅ Valid:
[Pass]— Player skips turn. - ❌ Invalid:
[PASS]— Case-sensitive token must match exactly[Pass].
5. Game State Schema
{
"seed": 42,
"turn_count": 1,
"current_player": "Sun",
"board_state": [
["_", "_", "_"],
["_", "_", "_"],
["_", "_", "_"]
],
"player_symbols": {
"Sun": "S",
"Moon": "M"
},
"history": [
{"player": "System", "message": "The ancient board awaits."}
],
"winner": null,
"status": "ongoing",
"available_moves": [
[0, 0], [0, 1], [0, 2],
[1, 0], [1, 1], [1, 2],
[2, 0], [2, 1], [2, 2]
],
"scores": {
"Sun": 0,
"Moon": 0
}
}
- Keys reflect a unique thematic world: the ancient “trail” board, emblems for “Sun” and “Moon,” and clear distinction from any negotiation-like schema.
6. Initialization Rules
- When
reset(seed)is called:- The RNG is seeded (though unused for determinism) using
seed. - The
board_stateis filled with_symbols representing empty stone tiles. - The first turn is always
Sun. - The
historylog begins with a world description. available_movesincludes all(row, col)pairs.
- The RNG is seeded (though unused for determinism) using
- Observation: Both players receive identical initial description and empty board visualization.
7. Validation and Error Handling
-
Extraction:
The environment will extract content from within\boxed{{}}using_extract_answer_content(action). -
Validation Steps:
- Verify action string matches one of the two regex patterns.
- If
[Mark:<r>,<c>], check:- 0 ≤ r,c ≤ 2
- Corresponding cell is unoccupied (
"_").
- If
[Pass], ensure no playable cells remain; otherwise invalid.
-
Invalid Reasons (examples):
- "Invalid format — must be [Mark:r,c] or [Pass]."
- "Chosen cell already occupied."
- "Row or column index out of range."
- "Cannot pass while moves still available."
If invalid, the system invokes set_invalid_move(reason) and forfeit logic may apply depending on higher-level controller.
8. Terminal Conditions and Scoring
Checks performed after each valid move:
-
Win Check:
If the current player owns three symbols aligned horizontally, vertically, or diagonally:winner = current_playerstatus = "finished"scores[current_player] = 1- Opponent receives 0.
-
Draw Check:
If all cells filled and no winner:winner = nullstatus = "draw"- Both scores = 0.5.
-
Continue Otherwise:
status = "ongoing"- Proceed to next player.
Tie-Break:
None beyond declared draw; equal scoring applies.
9. Player Prompt Specification
Prompt Identity and Instructions:
Each turn’s prompt should contain:
- A brief world intro:
“You are an explorer representing Team Sun (or Team Moon) claiming tiles on the ancient Tic-Tac-Trail.” - The current board visualization (3×3 grid of
_,S,M). - The list of allowed action formats:
[Mark:<row>,<col>]where<row>and<col>are integers 0–2.[Pass]if no unclaimed tiles remain.
- Reminder of victory condition: “Align three of your emblems in a straight line.”
- Rule reminder: “All actions must be enclosed in
\boxed{{}}at the end of your message.”
Few-shot examples:
Example valid response:
I should take the center stone before my rival.
\boxed{{[Mark:1,1]}}
Example invalid response (wrong format):
\boxed{{Mark:1,1}} <-- Missing brackets [ ]
Example valid response (board full, passing):
No moves left, I will pass.
\boxed{{[Pass]}}
Extraction Function Notice:
_extract_answer_content(self, action: str) -> str will strip \boxed{{}} syntax and return internal content for validation.
10. API Mapping Plan
| Method | Purpose | Operations on Game State | Output |
|---|---|---|---|
reset(seed) |
Initialize the game | Sets all keys per schema, seed board, assign first player (Sun), populate available_moves, generate initial system message |
Returns initial observations for both players |
step(player_action) |
Process one player's move | 1. Extract content with _extract_answer_content 2. Validate grammar & legality 3. If valid, apply to board_state 4. Append to history 5. Update available_moves 6. Check win/draw conditions, adjust scores, and advance turn |
Returns updated observations, reward info, done flag |
_generate_player_prompt(player_id) |
Builds textual context for that player | Uses the current board_state, turn_count, and list of legal actions. Demonstrates correct formatting. |
Returns formatted prompt string instructing the player to end with a \boxed{{}} action |
11. Copy-Check Against the Example
This design is fully distinct from any negotiation or resource-trading environment.
- Theme: Archaeological puzzle arena (grid conquest), not negotiation.
- Objectives: Claim territory and form a line, not reach mutual agreements.
- Entities: Ancient stones, Sun and Moon symbols—not participants in a deal.
- Game State Keys:
board_state,player_symbols,available_moves, andscores—entirely original. - Prompt Text: References Tic-Tac-Trail and ancient exploration, not disputes or offers.
Therefore, this specification represents a fully self-contained original turn-based environment for a deterministic tic-tac-toe–style strategy challenge, compliant with TextArena architecture.