Files
create-a-tic-tac-toe-game/environment.md
2001-01-01 00:00:00 +00:00

8.0 KiB
Raw Permalink Blame History

1. Concept Paragraph

Concept Overview:
Tic-Tac-Trail is a deterministic, turn-based tactical puzzle inspired by grid conquest—completely unrelated to negotiation or trade mechanics. Two explorers, Team Sun and Team Moon, compete to claim paths on an ancient 3×3 stone map. Each tile can be marked with their emblem (Sun or Moon). The first expedition to align three of their emblems in a continuous line (horizontal, vertical, or diagonal) awakens the temples power and wins. Core player commands are expressed as [Mark:<row>,<col>], describing which grid position to claim, or [Pass] if no legal move remains. The environment tracks placement, board state, turn order, and victory conditions deterministically.


2. Roles and Win Condition

  • Players:

    • Player 1: Team Sun (symbol “S”)
    • Player 2: Team Moon (symbol “M”)
  • Objective:
    Align three of ones symbols (S or M) in a straight line (row, column, or diagonal) before the board fills.

  • Decision Rules:

    • Win: First player to form an unbroken trio of their own emblem.
    • Loss: Opponent achieves a trio first.
    • Draw: All nine tiles filled without a winning alignment.
    • Once a win or draw occurs, the game becomes terminal and no further moves are accepted.

3. Turn Structure and Determinism

  • The game alternates turns strictly: Sun → Moon → Sun → Moon, and so on.
  • Turn count begins at 1 and increments after each valid action.
  • Maximum of nine turns (since there are nine cells).
  • No random factors exist; the game is fully deterministic.
  • Seed value (for reproducibility) is still stored in state, but unused—ensuring consistent replay.

4. Action Grammar (Machine-Parseable)

Permitted Actions:

4.1 Mark a Tile

  • Token Format: [Mark:<row>,<col>]
  • Pattern (regex): ^\[Mark:(0|1|2),(0|1|2)\]$
  • Semantics: Current player places their symbol on the specified cell (row, col) if its empty.

Examples:

  • Valid: [Mark:0,2] — Player marks top-right cell.
  • Invalid: [Mark:3,1] — Row "3" out of range (valid rows: 02).
  • Invalid: [Mark:1-2] — Comma separator or keyword missing.

4.2 Pass

  • Token Format: [Pass]
  • Pattern (regex): ^\[Pass\]$
  • Semantics: Used only if the player has no valid cell remaining (rare in tic-tac-toe).

Examples:

  • Valid: [Pass] — Player skips turn.
  • Invalid: [PASS] — Case-sensitive token must match exactly [Pass].

5. Game State Schema

{
  "seed": 42,
  "turn_count": 1,
  "current_player": "Sun",
  "board_state": [
    ["_", "_", "_"],
    ["_", "_", "_"],
    ["_", "_", "_"]
  ],
  "player_symbols": {
    "Sun": "S",
    "Moon": "M"
  },
  "history": [
    {"player": "System", "message": "The ancient board awaits."}
  ],
  "winner": null,
  "status": "ongoing",
  "available_moves": [
    [0, 0], [0, 1], [0, 2],
    [1, 0], [1, 1], [1, 2],
    [2, 0], [2, 1], [2, 2]
  ],
  "scores": {
    "Sun": 0,
    "Moon": 0
  }
}
  • Keys reflect a unique thematic world: the ancient “trail” board, emblems for “Sun” and “Moon,” and clear distinction from any negotiation-like schema.

6. Initialization Rules

  • When reset(seed) is called:
    1. The RNG is seeded (though unused for determinism) using seed.
    2. The board_state is filled with _ symbols representing empty stone tiles.
    3. The first turn is always Sun.
    4. The history log begins with a world description.
    5. available_moves includes all (row, col) pairs.
  • Observation: Both players receive identical initial description and empty board visualization.

7. Validation and Error Handling

  • Extraction:
    The environment will extract content from within \boxed{{}} using _extract_answer_content(action).

  • Validation Steps:

    1. Verify action string matches one of the two regex patterns.
    2. If [Mark:<r>,<c>], check:
      • 0 ≤ r,c ≤ 2
      • Corresponding cell is unoccupied ("_").
    3. If [Pass], ensure no playable cells remain; otherwise invalid.
  • Invalid Reasons (examples):

    • "Invalid format — must be [Mark:r,c] or [Pass]."
    • "Chosen cell already occupied."
    • "Row or column index out of range."
    • "Cannot pass while moves still available."

If invalid, the system invokes set_invalid_move(reason) and forfeit logic may apply depending on higher-level controller.


8. Terminal Conditions and Scoring

Checks performed after each valid move:

  1. Win Check:
    If the current player owns three symbols aligned horizontally, vertically, or diagonally:

    • winner = current_player
    • status = "finished"
    • scores[current_player] = 1
    • Opponent receives 0.
  2. Draw Check:
    If all cells filled and no winner:

    • winner = null
    • status = "draw"
    • Both scores = 0.5.
  3. Continue Otherwise:

    • status = "ongoing"
    • Proceed to next player.

Tie-Break:
None beyond declared draw; equal scoring applies.


9. Player Prompt Specification

Prompt Identity and Instructions:

Each turns prompt should contain:

  1. A brief world intro:
    “You are an explorer representing Team Sun (or Team Moon) claiming tiles on the ancient Tic-Tac-Trail.”
  2. The current board visualization (3×3 grid of _, S, M).
  3. The list of allowed action formats:
    • [Mark:<row>,<col>] where <row> and <col> are integers 02.
    • [Pass] if no unclaimed tiles remain.
  4. Reminder of victory condition: “Align three of your emblems in a straight line.”
  5. Rule reminder: “All actions must be enclosed in \boxed{{}} at the end of your message.”

Few-shot examples:

Example valid response:
I should take the center stone before my rival.
\boxed{{[Mark:1,1]}}
Example invalid response (wrong format):
\boxed{{Mark:1,1}}   <-- Missing brackets [ ]
Example valid response (board full, passing):
No moves left, I will pass.
\boxed{{[Pass]}}

Extraction Function Notice:
_extract_answer_content(self, action: str) -> str will strip \boxed{{}} syntax and return internal content for validation.


10. API Mapping Plan

Method Purpose Operations on Game State Output
reset(seed) Initialize the game Sets all keys per schema, seed board, assign first player (Sun), populate available_moves, generate initial system message Returns initial observations for both players
step(player_action) Process one player's move 1. Extract content with _extract_answer_content 2. Validate grammar & legality 3. If valid, apply to board_state 4. Append to history 5. Update available_moves 6. Check win/draw conditions, adjust scores, and advance turn Returns updated observations, reward info, done flag
_generate_player_prompt(player_id) Builds textual context for that player Uses the current board_state, turn_count, and list of legal actions. Demonstrates correct formatting. Returns formatted prompt string instructing the player to end with a \boxed{{}} action

11. Copy-Check Against the Example

This design is fully distinct from any negotiation or resource-trading environment.

  • Theme: Archaeological puzzle arena (grid conquest), not negotiation.
  • Objectives: Claim territory and form a line, not reach mutual agreements.
  • Entities: Ancient stones, Sun and Moon symbols—not participants in a deal.
  • Game State Keys: board_state, player_symbols, available_moves, and scores—entirely original.
  • Prompt Text: References Tic-Tac-Trail and ancient exploration, not disputes or offers.

Therefore, this specification represents a fully self-contained original turn-based environment for a deterministic tic-tac-toestyle strategy challenge, compliant with TextArena architecture.