Files
testtest4/README.md

240 lines
8.2 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Game Design Document: **Labyrinth Conquest**
---
## 1. Concept Paragraph
**Game Concept:**
*Labyrinth Conquest* is a **turn-based, deterministic grid-navigation strategy game** for two players competing to retrieve a relic hidden within a shifting labyrinth. Each player commands an **Explorer**, represented by a marker on a square grid of tiles. The labyrinth contains walls, traps, and hazards that limit movement but are fully known to both players. Players alternate turns choosing actions to **Move**, **Rotate Tiles**, or **Activate Gadgets** in order to reach the central **Relic Tile** first. This design is **entirely original and unrelated to negotiation or trade-based gameplay**. The environment's challenge lies in spatial reasoning and path optimization.
---
## 2. Roles and Win Condition
**Roles:**
- **Player A** and **Player B** each control a distinct Explorer starting from opposite corners of the labyrinth.
- Both can observe the entire labyrinth state at all times.
**Win Condition:**
- The first player to move their Explorer onto the **Relic Tile** wins the game immediately (`winner = current_player`).
- If neither player reaches the relic after a fixed number of turns (e.g., 40), the winner is the player **closest (by Manhattan distance)** to the relic.
- If both are equidistant, the result is declared a **Draw**.
---
## 3. Turn Structure and Determinism
- Players alternate turns strictly: Player A → Player B → Player A → …
- Each turn consists of **one valid action**.
- Determinism is ensured by:
- Fixed grid layout and trap positions controlled by RNG seed.
- Any randomized initial layout generation uses the provided `seed` for exact reproducibility.
- Maximum turn limit: **40 turns per player** (80 total).
- Game ends immediately if a terminal condition is met.
---
## 4. Action Grammar (Machine-Parseable)
### Action Types:
Players may issue exactly one of the following tokens per turn, enclosed in `\boxed{{}}` during play.
---
#### 1. **[Move: <direction>]**
- Moves the players Explorer one tile in a cardinal direction if no wall blocks the path.
- `<direction>` ∈ {`N`, `S`, `E`, `W`}
**Regex:**
`^\[Move: (N|S|E|W)\]$`
**Example valid:** `[Move: N]`
**Example invalid:** `[Move: north]` → Invalid because lowercase direction not allowed.
---
#### 2. **[Rotate: <x>,<y>,<dir>]**
- Rotates a specified tile at coordinates `(x,y)` one quarter-turn clockwise or counterclockwise.
- `<dir>` ∈ {`CW`, `CCW`}
**Regex:**
`^\[Rotate: [0-9]+,[0-9]+,(CW|CCW)\]$`
**Example valid:** `[Rotate: 2,3,CW]`
**Example invalid:** `[Rotate: x2,3,CW]` → Invalid because coordinate must be numeric.
---
#### 3. **[Activate: <gadget>]**
- Triggers one of the special gadgets: opening traps or shifting a row.
- `<gadget>` ∈ {`Bridge`, `TrapDisarm`, `RowShift`}
**Regex:**
`^\[Activate: (Bridge|TrapDisarm|RowShift)\]$`
**Example valid:** `[Activate: Bridge]`
**Example invalid:** `[Activate: Fly]` → Invalid gadget keyword.
---
### Validation Notes:
Only one token per turn is permitted. Spacing, capitalization, and punctuation must **exactly** match these predefined grammars.
---
## 5. Game State Schema
```json
{
"grid_size": 5,
"tiles": [
["floor", "wall", "trap", "floor", "floor"],
["floor", "floor", "wall", "trap", "floor"],
["floor", "wall", "relic", "floor", "floor"],
["floor", "trap", "floor", "wall", "floor"],
["startA", "floor", "floor", "floor", "startB"]
],
"player_states": {
"A": {
"position": [0, 0],
"gadgets": ["Bridge", "TrapDisarm"],
"moves_taken": 5,
"distance_to_relic": 6
},
"B": {
"position": [4, 4],
"gadgets": ["RowShift"],
"moves_taken": 4,
"distance_to_relic": 8
}
},
"turn_number": 9,
"current_player": "A",
"seed": 42,
"action_history": [
"A: [Move: E]",
"B: [Rotate: 3,3,CW]",
"A: [Activate: Bridge]"
],
"winner": null,
"terminated": false,
"invalid_reason": null,
"observations": [
"Game begins. Players start in opposite corners.",
"A moved east.",
"B rotated tile (3,3) clockwise."
]
}
```
---
## 6. Initialization Rules
- A seeded RNG (`seed` input at `reset`) controls:
- Tile placement (`wall`, `trap`, `floor`, `relic`)
- Starting gadget distributions.
- Starting layout:
- `startA` at `(0,0)`, `startB` at `(grid_size-1, grid_size-1)`, `relic` at center.
- Each player begins with **2 random gadgets**.
- The first observation announces the initial labyrinth map and coordinates.
- No random movement during play ensures full determinism post-reset.
---
## 7. Validation and Error Handling
**Illegal Actions Detected If:**
- The unboxed action string does not match any defined regex pattern → `Reason: "Invalid action format"`
- The target coordinate `(x,y)` is outside the grid → `Reason: "Tile out of bounds"`
- Attempted movement blocked by a wall → `Reason: "Wall blocks path"`
- Gadget already used → `Reason: "Gadget unavailable"`
- Player issues multiple actions or malformed tokens → `Reason: "Multiple or malformed commands"`
When detected, the environment will call `set_invalid_move(player, reason)` and the opponent automatically wins unless `training_mode` allows retry.
---
## 8. Terminal Conditions and Scoring
**Terminal Checks Each Turn:**
1. If a players new position contains `"relic"`, `winner = current_player`.
2. If `turn_number >= max_turns`, compute `distance_to_relic` for both.
- Shorter distance → winner.
- Equal distance → `winner = null`, `draw = True`.
3. If an invalid move occurs, `winner = opponent`.
**Scoring:**
- `Winner`: +1 point
- `Loser`: 0 points
- `Draw`: both get 0.5 points
---
## 9. Player Prompt Specification
Each `_generate_player_prompt` presents the labyrinth, Explorer positions, remaining gadgets, turn count, and explicit action grammar.
**Prompt Outline:**
```
You are an Explorer navigating a shifting labyrinth.
Your goal is to reach the Relic Tile before your opponent by issuing one of the allowed commands.
Available actions (case-sensitive):
- [Move: N|S|E|W] — Move one tile in a direction if no wall blocks the way.
- [Rotate: x,y,CW|CCW] — Rotate the tile at coordinates (x,y).
- [Activate: Bridge|TrapDisarm|RowShift] — Use one of your gadgets (if available).
Current Turn: 9
You are Player A. Opponent is Player B.
Your position: (0,0)
Relic position: (2,2)
Available gadgets: Bridge, TrapDisarm
Respond with exactly one valid action token.
Put your final answer within \boxed{{}} at the end of your response.
Example valid response:
I will move north to progress toward the relic.
\boxed{{[Move: N]}}
Example invalid response:
\boxed{{Move north}} ← Invalid format; must include brackets and colon.
```
---
## 10. API Mapping Plan
### `reset(seed=None)`
- Creates a deterministic labyrinth with walls, traps, relic, and player starts.
- Initializes `game_state` following schema.
- Adds initial observations describing layout and objectives.
- Returns `obs` for both players.
### `step(player_id, action)`
- Extracts content using `_extract_answer_content`.
- Validates action format and feasibility.
- Updates positions, tile orientations, and available gadgets deterministically.
- Appends the action to `action_history` and `observations`.
- Checks terminal conditions; sets `terminated` and `winner` when satisfied.
- Returns updated observation and reward outcomes.
### `_generate_player_prompt(player_id)`
- Builds the full text prompt described above, tailored to the players view of current state.
- Queries `game_state` for position, gadgets, current turn, and visible grid.
- Appends example output section.
---
## 11. Copy-Check Against the Example
This design features a **completely unique environment**:
- **Theme:** Spatial navigation and puzzle solving (not negotiation or economy).
- **Terminology:** Explorers, relic, labyrinth, tiles, gadgets — none appear in the example.
- **Game mechanics:** Grid movement and tile transformation — unrelated to offers, deals, or trade.
- **State keys:** (`tiles`, `gadgets`, `relic`, `turn_number`, etc.) are original.
- **Prompt text** describes an exploration challenge, not an agreement or exchange.
Hence, *Labyrinth Conquest* satisfies the requirement to be a distinct, self-contained, deterministic, turn-based navigation environment.