Add environment documentation from Openverse builder

This commit is contained in:
Openverse Builder
2001-01-01 00:00:00 +00:00
parent 62cd544aaf
commit a3fe4321df

193
environment.md Normal file
View File

@@ -0,0 +1,193 @@
# **Game Design Document: “Orbital Align” (Deterministic Turn-Based Strategy Inspired by Tic-Tac-Toe)**
---
## 1. Concept Paragraph
**Setting & Theme:**
In *Orbital Align*, two rival star captains compete to align their fleets of orbital satellites across a 3×3 planetary grid suspended around a dying star. Unlike classic tic-tac-toe, this version reimagines the board as orbital nodes where each satellite placement represents a strategic claim of spatial control. The goal is to align three satellites in a row—horizontally, vertically, or diagonally—before the opponent does.
**Core action tokens:**
`[Deploy:x,y]` (to place a satellite on coordinates), and `[Scan]` (forfeit placement to reveal the current grid state).
This design is *completely unrelated* to any previous negotiation or resource trading example. It uses a new setting, terminology, and objectives.
---
## 2. Roles and Win Condition
**Roles:**
- **Player A (Commander Solis)** and **Player B (Commander Nyx)** each command a distinct orbital fleet.
- Each players satellite is marked distinctly (`S` for Solis, `N` for Nyx`).
**Win Condition:**
- A player wins if they align **three of their satellites** consecutively in any row, column, or diagonal.
- If all nine grid cells are filled without a winning alignment, the result is a **draw**.
**Loss Condition:**
- A player loses if the opponent achieves an alignment before them.
- A player also loses immediately if they perform an **invalid action** that cannot be corrected within the same turn.
---
## 3. Turn Structure and Determinism
- The game progresses **alternating turns**, starting with Commander Solis (Player A).
- **Each turn**: Current player chooses one action (`Deploy` or `Scan`).
- Maximum **turn limit**: 9 (the grid has 9 total cells).
- The environment uses a reproducible **random seed**—though this game itself has no stochastic actions, seeding ensures deterministic ordering if future extensions add random elements.
---
## 4. Action Grammar (Machine-Parseable)
**Permitted Action Tokens**
| Action | Meaning | Formal Regex | Example Valid | Example Invalid | Reason Invalid |
|:--|:--|:--|:--|:--|:--|
| `[Deploy:x,y]` | Place a satellite at coordinates (x,y) where x,y ∈ {1,2,3} | `^\[Deploy:(?:[1-3]),(?:[1-3])\]$` | `[Deploy:2,3]` | `[Deploy:4,1]` | 4 outside valid range |
| `[Scan]` | View the current orbital grid instead of placing | `^\[Scan\]$` | `[Scan]` | `[ScanGrid]` | Incorrect token name |
**Rules:**
- Coordinates (x,y) correspond to the grid: (1,1) = top-left, (3,3) = bottom-right.
- No double occupation allowed—if a player tries to `Deploy` on an occupied node, it is invalid.
---
## 5. Game State Schema
Example serialized game state:
```json
{
"turn_count": 5,
"current_player": "Commander Solis",
"board": [
["S", "N", " "],
[" ", "S", " "],
["N", " ", " "]
],
"players": {
"Commander Solis": {
"symbol": "S",
"actions_taken": ["[Deploy:1,1]", "[Deploy:2,2]", "[Deploy:3,1]"]
},
"Commander Nyx": {
"symbol": "N",
"actions_taken": ["[Deploy:1,2]", "[Deploy:3,1]"]
}
},
"winner": null,
"is_terminal": false,
"last_action": "[Deploy:2,2]",
"observation_log": [
"Commander Solis deployed to 1,1",
"Commander Nyx deployed to 1,2",
"Commander Solis deployed to 2,2"
],
"seed": 42
}
```
---
## 6. Initialization Rules
- **Board**: Empty 3×3 grid represented as a list of lists containing `" "`.
- **Starting player**: Commander Solis always starts.
- **Seeding**: Random seed (e.g., `seed=42`) stored in `game_state` for deterministic replay.
- **Onboarding observations**:
Upon `reset`, each player receives:
- The empty grid state.
- Instructions on how to deploy satellites and when the game concludes.
---
## 7. Validation and Error Handling
**Validation checks in order:**
1. Verify that the extracted content matches one of the valid action patterns.
2. For `[Deploy:x,y]`, ensure:
- x, y within range 13.
- Target cell is empty.
3. For `[Scan]`, ensure no other content is appended.
4. If the regex or move legality fails, call
`set_invalid_move(player, reason)`
with one of:
- `"Malformed action syntax"`
- `"Coordinates out of range"`
- `"Target cell occupied"`
- `"Unrecognized action token"`
Action extraction must strip wrapping `\boxed{{...}}`, leaving only the internal content for validation.
---
## 8. Terminal Conditions and Scoring
**After each move**, the system checks:
1. **Win Check:**
- Rows, columns, and diagonals scanned for `['S', 'S', 'S']` or `['N', 'N', 'N']`.
- The corresponding player is marked `winner`.
2. **Draw Check:**
- If `turn_count == 9` and no winner ⇒ `"DRAW"`.
3. **Score Rules:**
- Winner = 1, Loser = 0.
- In draw = 0.5 each.
Tie-breakers are deterministic—no randomness or hidden state.
---
## 9. Player Prompt Specification
**Prompt Outline:**
> **IDENTITY BLURB:**
> You are a star commander controlling a fleet of satellites orbiting a dying star. Your mission is to align three of your satellites in a row across the 3×3 orbital grid before your rival does.
>
> **CURRENT STATE:**
> - The board shows your placements (S) and your opponents (N).
> - Empty cells are blank spaces.
>
> **AVAILABLE ACTIONS:**
> - `[Deploy:x,y]` → Place your satellite at coordinates (x,y) where x,y ∈ {1,2,3}.
> - `[Scan]` → Forfeit placement this turn to inspect the full orbital map.
>
> **FORMAT RULES:**
> - Each response must end with: `\boxed{{<action>}}`
> - Example of valid response:
> ```
> I will secure the top-right orbit next.
> \boxed{{[Deploy:1,3]}}
> ```
> - Example of invalid response:
> ```
> Lets attack next time.
> [Deploy:1,3]
> ```
> (Because it's missing `\boxed{{}}`.)
>
> **REMINDERS:**
> - You cannot deploy on an occupied orbit.
> - The game will end immediately if three satellites align or all nine orbits are filled.
All dialogue and moves are appended to the shared `observation_log`.
---
## 10. API Mapping Plan
| API Method | Purpose | Primary Read/Write | Terminal logic |
|-------------|----------|-------------------|----------------|
| `reset(seed)` | Initializes the grid, assigns symbols, clears logs, and sets starting player. | Writes entire `game_state`. | Returns initial observation and seed confirmation. |
| `step(action)` | Validates players boxed action, updates the grid/state, switches turns. | Reads `current_player`, `board`; writes updates, logs. | Runs win/draw checks after every move; sets `is_terminal`, `winner`. |
| `_generate_player_prompt(player)` | Builds textual prompt shown above, embedding the latest board and prior logs. | Reads from `board`, `observation_log`, and `current_player`. | Does not modify state; only generates text. |
On invalid actions, `step` calls `set_invalid_move(reason)` and forces a retry or ends the game if hopeless.
---
## 11. Copy-Check Against Example
All entity names (**Commander Solis**, **Commander Nyx**, **satellites**, **orbital grid**) and thematic terms are **original** and unrelated to any example negotiation or deal-making scenario. The games objective (aligning satellites on a 3×3 grid) derives from *tic-tac-toe mechanics* but expressed in a wholly new narrative context.
All `game_state` keys (`board`, `winner`, `observation_log`, `symbol`, etc.) are unique to *Orbital Align*, and none are borrowed from any trading, diplomacy, or economic system.