Files
test-v0/environment.md
2001-01-01 00:00:00 +00:00

7.4 KiB
Raw Permalink Blame History

Game Design Document: “Orbital Align” (Deterministic Turn-Based Strategy Inspired by Tic-Tac-Toe)


1. Concept Paragraph

Setting & Theme:
In Orbital Align, two rival star captains compete to align their fleets of orbital satellites across a 3×3 planetary grid suspended around a dying star. Unlike classic tic-tac-toe, this version reimagines the board as orbital nodes where each satellite placement represents a strategic claim of spatial control. The goal is to align three satellites in a row—horizontally, vertically, or diagonally—before the opponent does.
Core action tokens:
[Deploy:x,y] (to place a satellite on coordinates), and [Scan] (forfeit placement to reveal the current grid state).
This design is completely unrelated to any previous negotiation or resource trading example. It uses a new setting, terminology, and objectives.


2. Roles and Win Condition

Roles:

  • Player A (Commander Solis) and Player B (Commander Nyx) each command a distinct orbital fleet.
  • Each players satellite is marked distinctly (S for Solis, N for Nyx`).

Win Condition:

  • A player wins if they align three of their satellites consecutively in any row, column, or diagonal.
  • If all nine grid cells are filled without a winning alignment, the result is a draw.

Loss Condition:

  • A player loses if the opponent achieves an alignment before them.
  • A player also loses immediately if they perform an invalid action that cannot be corrected within the same turn.

3. Turn Structure and Determinism

  • The game progresses alternating turns, starting with Commander Solis (Player A).
  • Each turn: Current player chooses one action (Deploy or Scan).
  • Maximum turn limit: 9 (the grid has 9 total cells).
  • The environment uses a reproducible random seed—though this game itself has no stochastic actions, seeding ensures deterministic ordering if future extensions add random elements.

4. Action Grammar (Machine-Parseable)

Permitted Action Tokens

Action Meaning Formal Regex Example Valid Example Invalid Reason Invalid
[Deploy:x,y] Place a satellite at coordinates (x,y) where x,y ∈ {1,2,3} ^\[Deploy:(?:[1-3]),(?:[1-3])\]$ [Deploy:2,3] [Deploy:4,1] 4 outside valid range
[Scan] View the current orbital grid instead of placing ^\[Scan\]$ [Scan] [ScanGrid] Incorrect token name

Rules:

  • Coordinates (x,y) correspond to the grid: (1,1) = top-left, (3,3) = bottom-right.
  • No double occupation allowed—if a player tries to Deploy on an occupied node, it is invalid.

5. Game State Schema

Example serialized game state:

{
  "turn_count": 5,
  "current_player": "Commander Solis",
  "board": [
    ["S", "N", " "],
    [" ", "S", " "],
    ["N", " ", " "]
  ],
  "players": {
    "Commander Solis": {
      "symbol": "S",
      "actions_taken": ["[Deploy:1,1]", "[Deploy:2,2]", "[Deploy:3,1]"]
    },
    "Commander Nyx": {
      "symbol": "N",
      "actions_taken": ["[Deploy:1,2]", "[Deploy:3,1]"]
    }
  },
  "winner": null,
  "is_terminal": false,
  "last_action": "[Deploy:2,2]",
  "observation_log": [
    "Commander Solis deployed to 1,1",
    "Commander Nyx deployed to 1,2",
    "Commander Solis deployed to 2,2"
  ],
  "seed": 42
}

6. Initialization Rules

  • Board: Empty 3×3 grid represented as a list of lists containing " ".
  • Starting player: Commander Solis always starts.
  • Seeding: Random seed (e.g., seed=42) stored in game_state for deterministic replay.
  • Onboarding observations:
    Upon reset, each player receives:
    • The empty grid state.
    • Instructions on how to deploy satellites and when the game concludes.

7. Validation and Error Handling

Validation checks in order:

  1. Verify that the extracted content matches one of the valid action patterns.
  2. For [Deploy:x,y], ensure:
    • x, y within range 13.
    • Target cell is empty.
  3. For [Scan], ensure no other content is appended.
  4. If the regex or move legality fails, call
    set_invalid_move(player, reason)
    with one of:
    • "Malformed action syntax"
    • "Coordinates out of range"
    • "Target cell occupied"
    • "Unrecognized action token"

Action extraction must strip wrapping \boxed{{...}}, leaving only the internal content for validation.


8. Terminal Conditions and Scoring

After each move, the system checks:

  1. Win Check:
    • Rows, columns, and diagonals scanned for ['S', 'S', 'S'] or ['N', 'N', 'N'].
    • The corresponding player is marked winner.
  2. Draw Check:
    • If turn_count == 9 and no winner ⇒ "DRAW".
  3. Score Rules:
    • Winner = 1, Loser = 0.
    • In draw = 0.5 each.

Tie-breakers are deterministic—no randomness or hidden state.


9. Player Prompt Specification

Prompt Outline:

IDENTITY BLURB:
You are a star commander controlling a fleet of satellites orbiting a dying star. Your mission is to align three of your satellites in a row across the 3×3 orbital grid before your rival does.

CURRENT STATE:

  • The board shows your placements (S) and your opponents (N).
  • Empty cells are blank spaces.

AVAILABLE ACTIONS:

  • [Deploy:x,y] → Place your satellite at coordinates (x,y) where x,y ∈ {1,2,3}.
  • [Scan] → Forfeit placement this turn to inspect the full orbital map.

FORMAT RULES:

  • Each response must end with: \boxed{{<action>}}
  • Example of valid response:
    I will secure the top-right orbit next.
    \boxed{{[Deploy:1,3]}}
    
  • Example of invalid response:
    Lets attack next time. 
    [Deploy:1,3]
    
    (Because it's missing \boxed{{}}.)

REMINDERS:

  • You cannot deploy on an occupied orbit.
  • The game will end immediately if three satellites align or all nine orbits are filled.

All dialogue and moves are appended to the shared observation_log.


10. API Mapping Plan

API Method Purpose Primary Read/Write Terminal logic
reset(seed) Initializes the grid, assigns symbols, clears logs, and sets starting player. Writes entire game_state. Returns initial observation and seed confirmation.
step(action) Validates players boxed action, updates the grid/state, switches turns. Reads current_player, board; writes updates, logs. Runs win/draw checks after every move; sets is_terminal, winner.
_generate_player_prompt(player) Builds textual prompt shown above, embedding the latest board and prior logs. Reads from board, observation_log, and current_player. Does not modify state; only generates text.

On invalid actions, step calls set_invalid_move(reason) and forces a retry or ends the game if hopeless.


11. Copy-Check Against Example

All entity names (Commander Solis, Commander Nyx, satellites, orbital grid) and thematic terms are original and unrelated to any example negotiation or deal-making scenario. The games objective (aligning satellites on a 3×3 grid) derives from tic-tac-toe mechanics but expressed in a wholly new narrative context.
All game_state keys (board, winner, observation_log, symbol, etc.) are unique to Orbital Align, and none are borrowed from any trading, diplomacy, or economic system.