Files
testtest2/environment.md

215 lines
7.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# **Game Design Document: “Labyrinth Command”**
---
## 1. Concept Paragraph
**“Labyrinth Command”** is a deterministic, turn-based two-player tactical maze exploration game. Two rival explorers are trapped inside a grid-shaped labyrinth and must reach the **Central Beacon** at the mazes heart before their opponent. Each turn, players issue one command from a fixed grammar of movement and interaction tokens (e.g., `[Move:North]`, `[Scan]`, `[Wait]`). The maze layout, beacon position, and obstacles are generated deterministically from a single seed, ensuring reproducibility. The game is **not** related to any economic, negotiation, or resource-trading example—its theme focuses purely on spatial logic and exploration within a confined environment.
---
## 2. Roles and Win Condition
**Roles**
- **Explorer A** and **Explorer B** are rival adventurers in identical labyrinth conditions.
- Both start at distinct, opposite corners of the maze.
**Objectives**
- Reach the **Central Beacon Cell (B)** before the opponent.
- A secondary scoring system tracks proximity to the Beacon at game end if neither player reaches it within the turn limit.
**Win Rule**
1. A player *wins immediately* if they enter the Beacon cell first.
2. If both reach simultaneously on the same turn: **Draw**.
3. If turn limit expires with no beacon reached: player closer (Manhattan distance) to the Beacon **wins**.
4. If both are equally distant: **Draw**.
---
## 3. Turn Structure and Determinism
- The game proceeds in **alternating turns**, starting with Explorer A.
- Each turn = one player action followed by environment update and opponent observation.
- **Turn limit:** 20 turns per player (40 total).
- Maze generation and beacon placement use a **seed** value set at `reset`, guaranteeing fully deterministic structure and outcomes for identical seeds.
- All elements of randomness (e.g., obstacle positions) derive from this same seed.
---
## 4. Action Grammar (Machine-Parseable)
**Allowed Action Tokens (case-sensitive):**
| Token Pattern | Meaning |
|----------------|----------|
| `[Move:Direction]` | Move one cell in a cardinal direction (`North`, `South`, `East`, `West`) if not blocked. |
| `[Scan]` | Reveal contents of adjacent cells to update the players visible map. |
| `[Wait]` | Skip the move, useful for strategic timing. |
**Formal Patterns (Regex-style):**
1. `^\\[Move:(North|South|East|West)\\]$`
2. `^\\[Scan\\]$`
3. `^\\[Wait\\]$`
**Examples**
| Action | Validity | Explanation |
|--------|-----------|-------------|
| `[Move:North]` | ✅ Valid | Matches move pattern |
| `[Scan]` | ✅ Valid | Matches scan pattern |
| `[Wait]` | ✅ Valid | Matches wait pattern |
| `[Move:Northeast]` | ❌ Invalid | Direction not allowed |
| `[move:North]` | ❌ Invalid | Case-sensitive mismatch |
| `[Attack]` | ❌ Invalid | Unsupported token |
---
## 5. Game State Schema
```json
{
"seed": 18457,
"turn_index": 6,
"max_turns": 40,
"maze_width": 7,
"maze_height": 7,
"beacon_position": [3, 3],
"cells_blocked": [[0,1],[2,2],[4,5]],
"player_states": {
"A": {
"position": [0,0],
"visible_map": [["?", "X", "?", "?"],["?", ".", ".", "?"],["?", "?", ".", "?"]],
"visited_cells": [[0,0],[1,0]],
"last_action": "[Move:South]"
},
"B": {
"position": [6,6],
"visible_map": [["?", ".", "?"],[".", ".", "?"],["?", "?", "?"]],
"visited_cells": [[6,6]],
"last_action": "[Scan]"
}
},
"transcript": [
{"player":"A", "action":"[Move:South]"},
{"player":"B", "action":"[Scan]"}
],
"winner": null,
"terminated": false
}
```
---
## 6. Initialization Rules
- Maze layout generated through seeded deterministic algorithm (`seed` provided or auto-generated).
- Both players placed:
- Explorer A → top-left corner `[0,0]`
- Explorer B → bottom-right corner `[width-1,height-1]`
- Beacon placed at center `(width//2, height//2)`.
- `visible_map` initialized with limited visibility: only 3×3 region around player marked or unknown.
- At `reset`, each player receives:
- Maze dimensions
- Starting coordinates
- Number of turns and win condition summary
---
## 7. Validation and Error Handling
**Invalid Move Detection Rules**
- Action not matching one of the defined regex patterns → `Invalid token format`
- Action would move explorer outside maze bounds → `Move out of bounds`
- Action would move explorer into blocked cell → `Cell blocked`
- Any attempt made after terminal state → `Game already finished`
System calls `set_invalid_move(player, reason)` upon detection.
---
## 8. Terminal Conditions and Scoring
**Terminal Triggers**
1. Player enters the Beacon cell → Win for that player.
2. Both reach Beacon simultaneously → Draw.
3. Turn limit reached → Compare distance to Beacon.
- Smaller Manhattan distance → Win.
- Equal → Draw.
**Scoring Computation**
- Winner gets `1`, loser `0`, draw `0.5`.
- Stored in `winner` key as `"A"`, `"B"`, or `"Draw"`.
---
## 9. Player Prompt Specification
**Prompt Content Outline**
- Game title and theme summary
- Players identity (Explorer A or B)
- Current turn number and limits
- Players current position, visible map grid, and last known opponent action
- List of allowable command formats
- Reminder to place final command inside `\boxed{{}}`
- Examples of valid vs invalid formatting
**Prompt Example**
```
You are Explorer A navigating the labyrinth. Your goal is to reach the Central Beacon before your rival.
You can issue ONE command per turn using the following grammar:
[Move:North] | [Move:South] | [Move:East] | [Move:West] | [Scan] | [Wait]
Remember:
- Moving into blocked walls or out of bounds is invalid.
- The beacon lies at the labyrinths center.
- You must wrap your command inside \\boxed{{}}.
Example valid response:
I want to go north to advance toward the beacon.
\boxed{{[Move:North]}}
Example invalid response:
Lets head northeast. ← invalid direction keyword
Now it is your turn. Choose your next command carefully.
Put your final answer within \\boxed{{}} at the end of your response.
```
**Helper:** `_extract_answer_content(self, action: str) -> str`
Extracts the content enclosed by `\boxed{{...}}` for validation and execution.
---
## 10. API Mapping Plan
**reset()**
- Generate deterministic maze grid based on seed.
- Initialize all fields of `game_state` per schema.
- Return initial observation for each player, including map visibility and rules summary.
**step(player_action)**
- Use `_extract_answer_content` to unwrap the boxed token.
- Validate with grammar and state constraints.
- If invalid → call `set_invalid_move`.
- If valid → mutate player position/visibility, append to `transcript`.
- Perform terminal condition checks after each move; update `winner` and `terminated` appropriately.
- Return resulting state observation and game status.
**_generate_player_prompt(player_id)**
- Construct text prompt per section 9.
- Include available moves, last opponent move, remaining turns, and map details.
- Append "Put your final answer within \\boxed{{}} at the end of your response."
---
## 11. Copy-Check Against the Example
- The **Labyrinth Command** game has an *exploration and spatial logic* theme, **not** negotiation, trade, or economy-related.
- All entities—**maze**, **beacon**, **blocked cells**, and **explorers**—are original constructs.
- Action tokens `[Move:…]`, `[Scan]`, `[Wait]`, and state keys (`beacon_position`, `cells_blocked`, `visible_map`) are unique to this design.
- No resource exchanges, offers, or bargaining are present.
---
**End of Design Document “Labyrinth Command”**