Add environment documentation from Openverse builder

This commit is contained in:
Openverse Builder
2001-01-01 00:00:00 +00:00
parent 1e40154fa0
commit f955406004

201
environment.md Normal file
View File

@@ -0,0 +1,201 @@
# GAME DESIGN DOCUMENT — **"StarGrid Duel"**
---
### 1. Concept Paragraph
**StarGrid Duel** is a deterministic, turn-based strategy game inspired by the simplicity of grid conquest, but it is **not** tictactoe. Two rival starnavigators take turns deploying *energy beacons* on a 3×3 stellar grid. Their aim is to align three of their own beacons in a straight line of cosmic power (horizontal, vertical, or diagonal) before the opponent does, or to fill the grid entirely for a balanced standoff. Players will issue commands like `[Place: A2]` to deposit a beacon on a coordinate. The environment is purely deterministic: no randomness or negotiation mechanics are involved. The games purpose is to measure spatial foresight and terminal pattern recognition—completely unrelated to any negotiation or resource trading examples.
---
### 2. Roles and Win Condition
- **Roles**
- **Player A ("Navigator Alpha")**: Uses energy color **Blue**.
- **Player B ("Navigator Beta")**: Uses energy color **Crimson**.
- **Objective**
Be the first navigator to align three of your beacons continuously (row, column, or diagonal) on the 3×3 StarGrid.
- **Win Rule**
- A player **wins** immediately upon forming a line of three of their own symbols.
- The game is a **draw** if all nine cells are filled without a threeinaline configuration.
- Upon win or draw, the game enters a terminal state and no further actions are accepted.
---
### 3. Turn Structure and Determinism
- Players alternate turns beginning with Player A at turn index `0`.
- Each turn is atomic: exactly one action is taken.
- A deterministic seed ensures that initialization and any potential random ordering (none required here, but included for reproducibility) follow identical patterns.
- The turn counter increments after each valid action. Once nine valid turns have been processed or a win condition is met, the environment halts.
---
### 4. Action Grammar (MachineParsable)
Players specify grid placement commands targeting one unused cell.
**Allowed Actions**
```
[Place: <cell_id>]
```
**Cell IDs**
Valid values: `A1, A2, A3, B1, B2, B3, C1, C2, C3` (Rows AC, Columns 13)
**Formal Pattern (Regex)**
`^\[Place:\s*(A|B|C)(1|2|3)\]$`
**Examples**
- **Valid:** `[Place: B2]` → Places players beacon in the center cell.
- **Invalid Examples:**
- `[place: B2]` → Invalid capitalization and token name.
- `[Place: D1]``D1` not in allowed grid range.
- `[Deploy: A1]` → Invalid action token.
- `[Place: B2 extra]` → Extra text violates strict grammar.
All player outputs later will be wrapped in `\boxed{{…}}`. The implementation will extract the internal `[Place: X#]` command to validate according to the above pattern.
---
### 5. Game State Schema
```json
{
"turn_index": 4,
"active_player": "B",
"board": {
"A1": "Blue",
"A2": null,
"A3": "Crimson",
"B1": "Blue",
"B2": "Crimson",
"B3": null,
"C1": null,
"C2": null,
"C3": null
},
"player_symbols": {
"A": "Blue",
"B": "Crimson"
},
"move_history": [
{"player": "A", "action": "[Place: A1]"},
{"player": "B", "action": "[Place: A3]"},
{"player": "A", "action": "[Place: B1]"},
{"player": "B", "action": "[Place: B2]"}
],
"winner": null,
"is_draw": false,
"observations": {
"A": "Text transcript of latest game state for Alpha",
"B": "Text transcript of latest game state for Beta"
},
"seed": 42
}
```
---
### 6. Initialization Rules
- `reset(seed)` initializes an empty 3×3 board with all cells `null`.
- The turn index resets to `0` with `active_player = "A"`.
- The same seed always ensures that turn order, board labeling, and any deterministic tie logic behave identically.
- Both players receive an onboarding observation describing:
- Empty StarGrid layout
- Their color and symbol
- Instructions and the legal action syntax
---
### 7. Validation and Error Handling
- Upon receiving a player move, extract the content inside `\boxed{{}}` using `_extract_answer_content`.
- Validate against the regex `^\[Place:\s*(A|B|C)(1|2|3)\]$`.
- Check that the specified cell is unoccupied.
- **Invalid Move Reasons**
- `"MalformedAction"`: Does not match required pattern.
- `"CellOutOfRange"`: Coordinate not part of StarGrid labels.
- `"CellOccupied"`: Target cell already taken.
- `"NotYourTurn"`: Attempt to act out of sequence after loss or between turns.
The environment calls `set_invalid_move(reason)` with a human-readable reason, retaining determinism (the turn is forfeited or handled as draw according to policy).
---
### 8. Terminal Conditions and Scoring
**Checks each turn immediately after placing a valid beacon:**
1. **Victory Check** If the current players beacons form any of the eight winning line patterns, set `winner = active_player`, terminate game.
2. **Draw Check** If no empty cells remain and no winner exists, set `is_draw = true`.
3. **Scoring**
- Win: `+1` score for winner, `0` for loser.
- Draw: `0.5` each as tie credit (for potential series mode).
**TieBreak Procedure**
If multiple win conditions appear simultaneously (impossible under normal rules), the first detected alignment pattern is applied deterministically.
---
### 9. Player Prompt Specification
Each player receives a structured prompt reflecting the current board and legal moves.
**Prompt Outline**
> **Identity Blurb:**
> You are a star navigator placing energy beacons on a galactic grid. Each cell you claim radiates your colors energy. The goal is to align three of your beacons in a line before the opponent.
> **Current Board State:**
> - Display a 3×3 grid with coordinates and current occupancy.
> **Your Color:** Blue or Crimson
> **Turn Information:** Which player moves next (`Navigator Alpha` or `Navigator Beta`)
> **Allowed Actions:**
> Format: `[Place: <cell_id>]`, where `<cell_id>` ∈ {A1,…,C3} and the cell must be empty.
> You must wrap your selected action inside `\boxed{{}}` at the end of your message.
> **Response Format:**
> You may reason about your move, then output your final choice within `\boxed{{}}`.
**FewShot Examples**
```
Example valid response:
I will claim the center of the grid to control diagonals.
\boxed{{[Place: B2]}}
Example invalid response:
I think I'll move now.
\boxed{{[Move: B2]}} ← "Move" not a valid token.
```
The function `_extract_answer_content(self, action: str) -> str` will remove `\boxed{{}}` wrappers and yield `[Place: X#]` for validation.
---
### 10. API Mapping Plan
- **`reset(seed)`**
- Sets initial empty `board`, `turn_index=0`, and seeds RNG for determinism.
- Returns initial observations (`"Navigator Alpha"`, `"Navigator Beta"`).
- **`step(player_action)`**
- Extracts action token with `_extract_answer_content`.
- Validates syntax and target cell availability.
- Updates `board`, appends to `move_history`, increments `turn_index`.
- After update, executes terminal checks (victory or draw).
- Produces new observations describing updated board state.
- **`_generate_player_prompt(player_id)`**
- Compiles textual description of board, current scores, and open cells.
- Lists permitted `[Place: <cell_id>]` choices.
- Concludes with directive: *Put your final answer within \boxed{{}} at the end of your response.*
All actions and resultant board states are deterministic given identical seeds and action sequences.
---
### 11. CopyCheck Against the Example
- The environment, terminology, and objective are **entirely original**.
- There is **no negotiation**, **no trading**, **no resource exchange**, and **no alignment with any bargaining mechanics** from the example environment.
- Entities (“Navigator Alpha/Beta,” “energy beacons,” “StarGrid”) and game state keys (`board`, `player_symbols`, `move_history`, etc.) are unique to this design.
- The theme is cosmic grid conquest, **not** any prior example domain.
---
**End of Design Document “StarGrid Duel”**