Add environment documentation from Openverse builder

2001-01-01 00:00:00 +00:00
parent 62cd544aaf
commit a3fe4321df
1 changed files with 193 additions and 0 deletions
--- a/environment.md
+++ b/environment.md
@@ -0,0 +1,193 @@
+# **Game Design Document: “Orbital Align” (Deterministic Turn-Based Strategy Inspired by Tic-Tac-Toe)**
+
+---
+
+## 1. Concept Paragraph
+
+**Setting & Theme:**  
+In *Orbital Align*, two rival star captains compete to align their fleets of orbital satellites across a 3×3 planetary grid suspended around a dying star. Unlike classic tic-tac-toe, this version reimagines the board as orbital nodes where each satellite placement represents a strategic claim of spatial control. The goal is to align three satellites in a row—horizontally, vertically, or diagonally—before the opponent does.  
+**Core action tokens:**  
+`[Deploy:x,y]` (to place a satellite on coordinates), and `[Scan]` (forfeit placement to reveal the current grid state).  
+This design is *completely unrelated* to any previous negotiation or resource trading example. It uses a new setting, terminology, and objectives.
+
+---
+
+## 2. Roles and Win Condition
+
+**Roles:**  
+- **Player A (Commander Solis)** and **Player B (Commander Nyx)** each command a distinct orbital fleet.
+- Each player’s satellite is marked distinctly (`S` for Solis, `N` for Nyx`).
+
+**Win Condition:**  
+- A player wins if they align **three of their satellites** consecutively in any row, column, or diagonal.
+- If all nine grid cells are filled without a winning alignment, the result is a **draw**.
+
+**Loss Condition:**  
+- A player loses if the opponent achieves an alignment before them.
+- A player also loses immediately if they perform an **invalid action** that cannot be corrected within the same turn.
+
+---
+
+## 3. Turn Structure and Determinism
+
+- The game progresses **alternating turns**, starting with Commander Solis (Player A).  
+- **Each turn**: Current player chooses one action (`Deploy` or `Scan`).  
+- Maximum **turn limit**: 9 (the grid has 9 total cells).  
+- The environment uses a reproducible **random seed**—though this game itself has no stochastic actions, seeding ensures deterministic ordering if future extensions add random elements.
+
+---
+
+## 4. Action Grammar (Machine-Parseable)
+
+**Permitted Action Tokens**  
+
+| Action | Meaning | Formal Regex | Example Valid | Example Invalid | Reason Invalid |
+|:--|:--|:--|:--|:--|:--|
+| `[Deploy:x,y]` | Place a satellite at coordinates (x,y) where x,y ∈ {1,2,3} | `^\[Deploy:(?:[1-3]),(?:[1-3])\]$` | `[Deploy:2,3]` | `[Deploy:4,1]` | 4 outside valid range |
+| `[Scan]` | View the current orbital grid instead of placing | `^\[Scan\]$` | `[Scan]` | `[ScanGrid]` | Incorrect token name |
+
+**Rules:**
+- Coordinates (x,y) correspond to the grid: (1,1) = top-left, (3,3) = bottom-right.
+- No double occupation allowed—if a player tries to `Deploy` on an occupied node, it is invalid.
+
+---
+
+## 5. Game State Schema
+
+Example serialized game state:
+
+```json
+{
+  "turn_count": 5,
+  "current_player": "Commander Solis",
+  "board": [
+    ["S", "N", " "],
+    [" ", "S", " "],
+    ["N", " ", " "]
+  ],
+  "players": {
+    "Commander Solis": {
+      "symbol": "S",
+      "actions_taken": ["[Deploy:1,1]", "[Deploy:2,2]", "[Deploy:3,1]"]
+    },
+    "Commander Nyx": {
+      "symbol": "N",
+      "actions_taken": ["[Deploy:1,2]", "[Deploy:3,1]"]
+    }
+  },
+  "winner": null,
+  "is_terminal": false,
+  "last_action": "[Deploy:2,2]",
+  "observation_log": [
+    "Commander Solis deployed to 1,1",
+    "Commander Nyx deployed to 1,2",
+    "Commander Solis deployed to 2,2"
+  ],
+  "seed": 42
+}
+```
+
+---
+
+## 6. Initialization Rules
+
+- **Board**: Empty 3×3 grid represented as a list of lists containing `" "`.  
+- **Starting player**: Commander Solis always starts.  
+- **Seeding**: Random seed (e.g., `seed=42`) stored in `game_state` for deterministic replay.  
+- **Onboarding observations**:  
+  Upon `reset`, each player receives:  
+  - The empty grid state.  
+  - Instructions on how to deploy satellites and when the game concludes.
+
+---
+
+## 7. Validation and Error Handling
+
+**Validation checks in order:**
+1. Verify that the extracted content matches one of the valid action patterns.
+2. For `[Deploy:x,y]`, ensure:
+   - x, y within range 1–3.
+   - Target cell is empty.
+3. For `[Scan]`, ensure no other content is appended.
+4. If the regex or move legality fails, call  
+   `set_invalid_move(player, reason)`  
+   with one of:
+   - `"Malformed action syntax"`
+   - `"Coordinates out of range"`
+   - `"Target cell occupied"`
+   - `"Unrecognized action token"`
+
+Action extraction must strip wrapping `\boxed{{...}}`, leaving only the internal content for validation.
+
+---
+
+## 8. Terminal Conditions and Scoring
+
+**After each move**, the system checks:
+
+1. **Win Check:**  
+   - Rows, columns, and diagonals scanned for `['S', 'S', 'S']` or `['N', 'N', 'N']`.  
+   - The corresponding player is marked `winner`.
+2. **Draw Check:**  
+   - If `turn_count == 9` and no winner ⇒ `"DRAW"`.
+3. **Score Rules:**  
+   - Winner = 1, Loser = 0.  
+   - In draw = 0.5 each.
+
+Tie-breakers are deterministic—no randomness or hidden state.
+
+---
+
+## 9. Player Prompt Specification
+
+**Prompt Outline:**
+
+> **IDENTITY BLURB:**  
+> You are a star commander controlling a fleet of satellites orbiting a dying star. Your mission is to align three of your satellites in a row across the 3×3 orbital grid before your rival does.
+>
+> **CURRENT STATE:**  
+> - The board shows your placements (S) and your opponent’s (N).  
+> - Empty cells are blank spaces.
+>
+> **AVAILABLE ACTIONS:**  
+> - `[Deploy:x,y]` → Place your satellite at coordinates (x,y) where x,y ∈ {1,2,3}.  
+> - `[Scan]` → Forfeit placement this turn to inspect the full orbital map.
+>
+> **FORMAT RULES:**  
+> - Each response must end with: `\boxed{{<action>}}`  
+> - Example of valid response:  
+>   ```
+>   I will secure the top-right orbit next.
+>   \boxed{{[Deploy:1,3]}}
+>   ```
+> - Example of invalid response:  
+>   ```
+>   Let’s attack next time. 
+>   [Deploy:1,3]
+>   ```
+>   (Because it's missing `\boxed{{}}`.)
+>
+> **REMINDERS:**  
+> - You cannot deploy on an occupied orbit.  
+> - The game will end immediately if three satellites align or all nine orbits are filled.  
+
+All dialogue and moves are appended to the shared `observation_log`.
+
+---
+
+## 10. API Mapping Plan
+
+| API Method | Purpose | Primary Read/Write | Terminal logic |
+|-------------|----------|-------------------|----------------|
+| `reset(seed)` | Initializes the grid, assigns symbols, clears logs, and sets starting player. | Writes entire `game_state`. | Returns initial observation and seed confirmation. |
+| `step(action)` | Validates player’s boxed action, updates the grid/state, switches turns. | Reads `current_player`, `board`; writes updates, logs. | Runs win/draw checks after every move; sets `is_terminal`, `winner`. |
+| `_generate_player_prompt(player)` | Builds textual prompt shown above, embedding the latest board and prior logs. | Reads from `board`, `observation_log`, and `current_player`. | Does not modify state; only generates text. |
+
+On invalid actions, `step` calls `set_invalid_move(reason)` and forces a retry or ends the game if hopeless.
+
+---
+
+## 11. Copy-Check Against Example
+
+All entity names (**Commander Solis**, **Commander Nyx**, **satellites**, **orbital grid**) and thematic terms are **original** and unrelated to any example negotiation or deal-making scenario. The game’s objective (aligning satellites on a 3×3 grid) derives from *tic-tac-toe mechanics* but expressed in a wholly new narrative context.  
+All `game_state` keys (`board`, `winner`, `observation_log`, `symbol`, etc.) are unique to *Orbital Align*, and none are borrowed from any trading, diplomacy, or economic system.