Files
test-v0/environment.md
2001-01-01 00:00:00 +00:00

193 lines
7.4 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# **Game Design Document: “Orbital Align” (Deterministic Turn-Based Strategy Inspired by Tic-Tac-Toe)**
---
## 1. Concept Paragraph
**Setting & Theme:**
In *Orbital Align*, two rival star captains compete to align their fleets of orbital satellites across a 3×3 planetary grid suspended around a dying star. Unlike classic tic-tac-toe, this version reimagines the board as orbital nodes where each satellite placement represents a strategic claim of spatial control. The goal is to align three satellites in a row—horizontally, vertically, or diagonally—before the opponent does.
**Core action tokens:**
`[Deploy:x,y]` (to place a satellite on coordinates), and `[Scan]` (forfeit placement to reveal the current grid state).
This design is *completely unrelated* to any previous negotiation or resource trading example. It uses a new setting, terminology, and objectives.
---
## 2. Roles and Win Condition
**Roles:**
- **Player A (Commander Solis)** and **Player B (Commander Nyx)** each command a distinct orbital fleet.
- Each players satellite is marked distinctly (`S` for Solis, `N` for Nyx`).
**Win Condition:**
- A player wins if they align **three of their satellites** consecutively in any row, column, or diagonal.
- If all nine grid cells are filled without a winning alignment, the result is a **draw**.
**Loss Condition:**
- A player loses if the opponent achieves an alignment before them.
- A player also loses immediately if they perform an **invalid action** that cannot be corrected within the same turn.
---
## 3. Turn Structure and Determinism
- The game progresses **alternating turns**, starting with Commander Solis (Player A).
- **Each turn**: Current player chooses one action (`Deploy` or `Scan`).
- Maximum **turn limit**: 9 (the grid has 9 total cells).
- The environment uses a reproducible **random seed**—though this game itself has no stochastic actions, seeding ensures deterministic ordering if future extensions add random elements.
---
## 4. Action Grammar (Machine-Parseable)
**Permitted Action Tokens**
| Action | Meaning | Formal Regex | Example Valid | Example Invalid | Reason Invalid |
|:--|:--|:--|:--|:--|:--|
| `[Deploy:x,y]` | Place a satellite at coordinates (x,y) where x,y ∈ {1,2,3} | `^\[Deploy:(?:[1-3]),(?:[1-3])\]$` | `[Deploy:2,3]` | `[Deploy:4,1]` | 4 outside valid range |
| `[Scan]` | View the current orbital grid instead of placing | `^\[Scan\]$` | `[Scan]` | `[ScanGrid]` | Incorrect token name |
**Rules:**
- Coordinates (x,y) correspond to the grid: (1,1) = top-left, (3,3) = bottom-right.
- No double occupation allowed—if a player tries to `Deploy` on an occupied node, it is invalid.
---
## 5. Game State Schema
Example serialized game state:
```json
{
"turn_count": 5,
"current_player": "Commander Solis",
"board": [
["S", "N", " "],
[" ", "S", " "],
["N", " ", " "]
],
"players": {
"Commander Solis": {
"symbol": "S",
"actions_taken": ["[Deploy:1,1]", "[Deploy:2,2]", "[Deploy:3,1]"]
},
"Commander Nyx": {
"symbol": "N",
"actions_taken": ["[Deploy:1,2]", "[Deploy:3,1]"]
}
},
"winner": null,
"is_terminal": false,
"last_action": "[Deploy:2,2]",
"observation_log": [
"Commander Solis deployed to 1,1",
"Commander Nyx deployed to 1,2",
"Commander Solis deployed to 2,2"
],
"seed": 42
}
```
---
## 6. Initialization Rules
- **Board**: Empty 3×3 grid represented as a list of lists containing `" "`.
- **Starting player**: Commander Solis always starts.
- **Seeding**: Random seed (e.g., `seed=42`) stored in `game_state` for deterministic replay.
- **Onboarding observations**:
Upon `reset`, each player receives:
- The empty grid state.
- Instructions on how to deploy satellites and when the game concludes.
---
## 7. Validation and Error Handling
**Validation checks in order:**
1. Verify that the extracted content matches one of the valid action patterns.
2. For `[Deploy:x,y]`, ensure:
- x, y within range 13.
- Target cell is empty.
3. For `[Scan]`, ensure no other content is appended.
4. If the regex or move legality fails, call
`set_invalid_move(player, reason)`
with one of:
- `"Malformed action syntax"`
- `"Coordinates out of range"`
- `"Target cell occupied"`
- `"Unrecognized action token"`
Action extraction must strip wrapping `\boxed{{...}}`, leaving only the internal content for validation.
---
## 8. Terminal Conditions and Scoring
**After each move**, the system checks:
1. **Win Check:**
- Rows, columns, and diagonals scanned for `['S', 'S', 'S']` or `['N', 'N', 'N']`.
- The corresponding player is marked `winner`.
2. **Draw Check:**
- If `turn_count == 9` and no winner ⇒ `"DRAW"`.
3. **Score Rules:**
- Winner = 1, Loser = 0.
- In draw = 0.5 each.
Tie-breakers are deterministic—no randomness or hidden state.
---
## 9. Player Prompt Specification
**Prompt Outline:**
> **IDENTITY BLURB:**
> You are a star commander controlling a fleet of satellites orbiting a dying star. Your mission is to align three of your satellites in a row across the 3×3 orbital grid before your rival does.
>
> **CURRENT STATE:**
> - The board shows your placements (S) and your opponents (N).
> - Empty cells are blank spaces.
>
> **AVAILABLE ACTIONS:**
> - `[Deploy:x,y]` → Place your satellite at coordinates (x,y) where x,y ∈ {1,2,3}.
> - `[Scan]` → Forfeit placement this turn to inspect the full orbital map.
>
> **FORMAT RULES:**
> - Each response must end with: `\boxed{{<action>}}`
> - Example of valid response:
> ```
> I will secure the top-right orbit next.
> \boxed{{[Deploy:1,3]}}
> ```
> - Example of invalid response:
> ```
> Lets attack next time.
> [Deploy:1,3]
> ```
> (Because it's missing `\boxed{{}}`.)
>
> **REMINDERS:**
> - You cannot deploy on an occupied orbit.
> - The game will end immediately if three satellites align or all nine orbits are filled.
All dialogue and moves are appended to the shared `observation_log`.
---
## 10. API Mapping Plan
| API Method | Purpose | Primary Read/Write | Terminal logic |
|-------------|----------|-------------------|----------------|
| `reset(seed)` | Initializes the grid, assigns symbols, clears logs, and sets starting player. | Writes entire `game_state`. | Returns initial observation and seed confirmation. |
| `step(action)` | Validates players boxed action, updates the grid/state, switches turns. | Reads `current_player`, `board`; writes updates, logs. | Runs win/draw checks after every move; sets `is_terminal`, `winner`. |
| `_generate_player_prompt(player)` | Builds textual prompt shown above, embedding the latest board and prior logs. | Reads from `board`, `observation_log`, and `current_player`. | Does not modify state; only generates text. |
On invalid actions, `step` calls `set_invalid_move(reason)` and forces a retry or ends the game if hopeless.
---
## 11. Copy-Check Against Example
All entity names (**Commander Solis**, **Commander Nyx**, **satellites**, **orbital grid**) and thematic terms are **original** and unrelated to any example negotiation or deal-making scenario. The games objective (aligning satellites on a 3×3 grid) derives from *tic-tac-toe mechanics* but expressed in a wholly new narrative context.
All `game_state` keys (`board`, `winner`, `observation_log`, `symbol`, etc.) are unique to *Orbital Align*, and none are borrowed from any trading, diplomacy, or economic system.