Files
blahblahblah/environment.md
2001-01-01 00:00:00 +00:00

201 lines
7.9 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# GAME DESIGN DOCUMENT — **"StarGrid Duel"**
---
### 1. Concept Paragraph
**StarGrid Duel** is a deterministic, turn-based strategy game inspired by the simplicity of grid conquest, but it is **not** tictactoe. Two rival starnavigators take turns deploying *energy beacons* on a 3×3 stellar grid. Their aim is to align three of their own beacons in a straight line of cosmic power (horizontal, vertical, or diagonal) before the opponent does, or to fill the grid entirely for a balanced standoff. Players will issue commands like `[Place: A2]` to deposit a beacon on a coordinate. The environment is purely deterministic: no randomness or negotiation mechanics are involved. The games purpose is to measure spatial foresight and terminal pattern recognition—completely unrelated to any negotiation or resource trading examples.
---
### 2. Roles and Win Condition
- **Roles**
- **Player A ("Navigator Alpha")**: Uses energy color **Blue**.
- **Player B ("Navigator Beta")**: Uses energy color **Crimson**.
- **Objective**
Be the first navigator to align three of your beacons continuously (row, column, or diagonal) on the 3×3 StarGrid.
- **Win Rule**
- A player **wins** immediately upon forming a line of three of their own symbols.
- The game is a **draw** if all nine cells are filled without a threeinaline configuration.
- Upon win or draw, the game enters a terminal state and no further actions are accepted.
---
### 3. Turn Structure and Determinism
- Players alternate turns beginning with Player A at turn index `0`.
- Each turn is atomic: exactly one action is taken.
- A deterministic seed ensures that initialization and any potential random ordering (none required here, but included for reproducibility) follow identical patterns.
- The turn counter increments after each valid action. Once nine valid turns have been processed or a win condition is met, the environment halts.
---
### 4. Action Grammar (MachineParsable)
Players specify grid placement commands targeting one unused cell.
**Allowed Actions**
```
[Place: <cell_id>]
```
**Cell IDs**
Valid values: `A1, A2, A3, B1, B2, B3, C1, C2, C3` (Rows AC, Columns 13)
**Formal Pattern (Regex)**
`^\[Place:\s*(A|B|C)(1|2|3)\]$`
**Examples**
- **Valid:** `[Place: B2]` → Places players beacon in the center cell.
- **Invalid Examples:**
- `[place: B2]` → Invalid capitalization and token name.
- `[Place: D1]``D1` not in allowed grid range.
- `[Deploy: A1]` → Invalid action token.
- `[Place: B2 extra]` → Extra text violates strict grammar.
All player outputs later will be wrapped in `\boxed{{…}}`. The implementation will extract the internal `[Place: X#]` command to validate according to the above pattern.
---
### 5. Game State Schema
```json
{
"turn_index": 4,
"active_player": "B",
"board": {
"A1": "Blue",
"A2": null,
"A3": "Crimson",
"B1": "Blue",
"B2": "Crimson",
"B3": null,
"C1": null,
"C2": null,
"C3": null
},
"player_symbols": {
"A": "Blue",
"B": "Crimson"
},
"move_history": [
{"player": "A", "action": "[Place: A1]"},
{"player": "B", "action": "[Place: A3]"},
{"player": "A", "action": "[Place: B1]"},
{"player": "B", "action": "[Place: B2]"}
],
"winner": null,
"is_draw": false,
"observations": {
"A": "Text transcript of latest game state for Alpha",
"B": "Text transcript of latest game state for Beta"
},
"seed": 42
}
```
---
### 6. Initialization Rules
- `reset(seed)` initializes an empty 3×3 board with all cells `null`.
- The turn index resets to `0` with `active_player = "A"`.
- The same seed always ensures that turn order, board labeling, and any deterministic tie logic behave identically.
- Both players receive an onboarding observation describing:
- Empty StarGrid layout
- Their color and symbol
- Instructions and the legal action syntax
---
### 7. Validation and Error Handling
- Upon receiving a player move, extract the content inside `\boxed{{}}` using `_extract_answer_content`.
- Validate against the regex `^\[Place:\s*(A|B|C)(1|2|3)\]$`.
- Check that the specified cell is unoccupied.
- **Invalid Move Reasons**
- `"MalformedAction"`: Does not match required pattern.
- `"CellOutOfRange"`: Coordinate not part of StarGrid labels.
- `"CellOccupied"`: Target cell already taken.
- `"NotYourTurn"`: Attempt to act out of sequence after loss or between turns.
The environment calls `set_invalid_move(reason)` with a human-readable reason, retaining determinism (the turn is forfeited or handled as draw according to policy).
---
### 8. Terminal Conditions and Scoring
**Checks each turn immediately after placing a valid beacon:**
1. **Victory Check** If the current players beacons form any of the eight winning line patterns, set `winner = active_player`, terminate game.
2. **Draw Check** If no empty cells remain and no winner exists, set `is_draw = true`.
3. **Scoring**
- Win: `+1` score for winner, `0` for loser.
- Draw: `0.5` each as tie credit (for potential series mode).
**TieBreak Procedure**
If multiple win conditions appear simultaneously (impossible under normal rules), the first detected alignment pattern is applied deterministically.
---
### 9. Player Prompt Specification
Each player receives a structured prompt reflecting the current board and legal moves.
**Prompt Outline**
> **Identity Blurb:**
> You are a star navigator placing energy beacons on a galactic grid. Each cell you claim radiates your colors energy. The goal is to align three of your beacons in a line before the opponent.
> **Current Board State:**
> - Display a 3×3 grid with coordinates and current occupancy.
> **Your Color:** Blue or Crimson
> **Turn Information:** Which player moves next (`Navigator Alpha` or `Navigator Beta`)
> **Allowed Actions:**
> Format: `[Place: <cell_id>]`, where `<cell_id>` ∈ {A1,…,C3} and the cell must be empty.
> You must wrap your selected action inside `\boxed{{}}` at the end of your message.
> **Response Format:**
> You may reason about your move, then output your final choice within `\boxed{{}}`.
**FewShot Examples**
```
Example valid response:
I will claim the center of the grid to control diagonals.
\boxed{{[Place: B2]}}
Example invalid response:
I think I'll move now.
\boxed{{[Move: B2]}} ← "Move" not a valid token.
```
The function `_extract_answer_content(self, action: str) -> str` will remove `\boxed{{}}` wrappers and yield `[Place: X#]` for validation.
---
### 10. API Mapping Plan
- **`reset(seed)`**
- Sets initial empty `board`, `turn_index=0`, and seeds RNG for determinism.
- Returns initial observations (`"Navigator Alpha"`, `"Navigator Beta"`).
- **`step(player_action)`**
- Extracts action token with `_extract_answer_content`.
- Validates syntax and target cell availability.
- Updates `board`, appends to `move_history`, increments `turn_index`.
- After update, executes terminal checks (victory or draw).
- Produces new observations describing updated board state.
- **`_generate_player_prompt(player_id)`**
- Compiles textual description of board, current scores, and open cells.
- Lists permitted `[Place: <cell_id>]` choices.
- Concludes with directive: *Put your final answer within \boxed{{}} at the end of your response.*
All actions and resultant board states are deterministic given identical seeds and action sequences.
---
### 11. CopyCheck Against the Example
- The environment, terminology, and objective are **entirely original**.
- There is **no negotiation**, **no trading**, **no resource exchange**, and **no alignment with any bargaining mechanics** from the example environment.
- Entities (“Navigator Alpha/Beta,” “energy beacons,” “StarGrid”) and game state keys (`board`, `player_symbols`, `move_history`, etc.) are unique to this design.
- The theme is cosmic grid conquest, **not** any prior example domain.
---
**End of Design Document “StarGrid Duel”**