Add environment documentation from Openverse builder

This commit is contained in:
Openverse Builder
2001-01-01 00:00:00 +00:00
parent 2ce0405988
commit 818d8d658f

223
environment.md Normal file
View File

@@ -0,0 +1,223 @@
---
# **TIC-TAC-TRAIL: A Turn-Based Strategy Design Document**
---
## **1. Concept Paragraph**
**Concept Overview:**
*Tic-Tac-Trail* is a deterministic, turn-based tactical puzzle inspired by grid conquest—completely unrelated to negotiation or trade mechanics. Two explorers, **Team Sun** and **Team Moon**, compete to claim paths on an ancient 3×3 stone map. Each tile can be marked with their emblem (`Sun` or `Moon`). The first expedition to align three of their emblems in a continuous line (horizontal, vertical, or diagonal) awakens the temples power and wins. Core player commands are expressed as `[Mark:<row>,<col>]`, describing which grid position to claim, or `[Pass]` if no legal move remains. The environment tracks placement, board state, turn order, and victory conditions deterministically.
---
## **2. Roles and Win Condition**
- **Players:**
- Player 1: *Team Sun* (symbol “S”)
- Player 2: *Team Moon* (symbol “M”)
- **Objective:**
Align three of ones symbols (`S` or `M`) in a straight line (row, column, or diagonal) before the board fills.
- **Decision Rules:**
- **Win:** First player to form an unbroken trio of their own emblem.
- **Loss:** Opponent achieves a trio first.
- **Draw:** All nine tiles filled without a winning alignment.
- Once a win or draw occurs, the game becomes terminal and no further moves are accepted.
---
## **3. Turn Structure and Determinism**
- The game alternates turns strictly: *Sun → Moon → Sun → Moon*, and so on.
- Turn count begins at 1 and increments after each valid action.
- Maximum of nine turns (since there are nine cells).
- No random factors exist; the game is **fully deterministic**.
- Seed value (for reproducibility) is still stored in state, but unused—ensuring consistent replay.
---
## **4. Action Grammar (Machine-Parseable)**
**Permitted Actions:**
### 4.1 Mark a Tile
- **Token Format:** `[Mark:<row>,<col>]`
- **Pattern (regex):** `^\[Mark:(0|1|2),(0|1|2)\]$`
- **Semantics:** Current player places their symbol on the specified cell `(row, col)` if its empty.
**Examples:**
-**Valid:** `[Mark:0,2]` — Player marks top-right cell.
-**Invalid:** `[Mark:3,1]` — Row "3" out of range (valid rows: 02).
-**Invalid:** `[Mark:1-2]` — Comma separator or keyword missing.
### 4.2 Pass
- **Token Format:** `[Pass]`
- **Pattern (regex):** `^\[Pass\]$`
- **Semantics:** Used only if the player has no valid cell remaining (rare in tic-tac-toe).
**Examples:**
-**Valid:** `[Pass]` — Player skips turn.
-**Invalid:** `[PASS]` — Case-sensitive token must match exactly `[Pass]`.
---
## **5. Game State Schema**
```json
{
"seed": 42,
"turn_count": 1,
"current_player": "Sun",
"board_state": [
["_", "_", "_"],
["_", "_", "_"],
["_", "_", "_"]
],
"player_symbols": {
"Sun": "S",
"Moon": "M"
},
"history": [
{"player": "System", "message": "The ancient board awaits."}
],
"winner": null,
"status": "ongoing",
"available_moves": [
[0, 0], [0, 1], [0, 2],
[1, 0], [1, 1], [1, 2],
[2, 0], [2, 1], [2, 2]
],
"scores": {
"Sun": 0,
"Moon": 0
}
}
```
- Keys reflect a unique thematic world: the ancient “trail” board, emblems for “Sun” and “Moon,” and clear distinction from any negotiation-like schema.
---
## **6. Initialization Rules**
- When `reset(seed)` is called:
1. The RNG is seeded (though unused for determinism) using `seed`.
2. The `board_state` is filled with `_` symbols representing empty stone tiles.
3. The first turn is always `Sun`.
4. The `history` log begins with a world description.
5. `available_moves` includes all `(row, col)` pairs.
- Observation: Both players receive identical initial description and empty board visualization.
---
## **7. Validation and Error Handling**
- **Extraction:**
The environment will extract content from within `\boxed{{}}` using `_extract_answer_content(action)`.
- **Validation Steps:**
1. Verify action string matches one of the two regex patterns.
2. If `[Mark:<r>,<c>]`, check:
- 0 ≤ r,c ≤ 2
- Corresponding cell is unoccupied (`"_"`).
3. If `[Pass]`, ensure no playable cells remain; otherwise invalid.
- **Invalid Reasons (examples):**
- "Invalid format — must be [Mark:r,c] or [Pass]."
- "Chosen cell already occupied."
- "Row or column index out of range."
- "Cannot pass while moves still available."
If invalid, the system invokes `set_invalid_move(reason)` and forfeit logic may apply depending on higher-level controller.
---
## **8. Terminal Conditions and Scoring**
**Checks performed after each valid move:**
1. **Win Check:**
If the current player owns three symbols aligned horizontally, vertically, or diagonally:
- `winner = current_player`
- `status = "finished"`
- `scores[current_player] = 1`
- Opponent receives 0.
2. **Draw Check:**
If all cells filled and no winner:
- `winner = null`
- `status = "draw"`
- Both scores = 0.5.
3. **Continue Otherwise:**
- `status = "ongoing"`
- Proceed to next player.
**Tie-Break:**
None beyond declared draw; equal scoring applies.
---
## **9. Player Prompt Specification**
**Prompt Identity and Instructions:**
Each turns prompt should contain:
1. A brief world intro:
“You are an explorer representing Team Sun (or Team Moon) claiming tiles on the ancient Tic-Tac-Trail.”
2. The current board visualization (3×3 grid of `_`, `S`, `M`).
3. The list of allowed action formats:
- `[Mark:<row>,<col>]` where `<row>` and `<col>` are integers 02.
- `[Pass]` if no unclaimed tiles remain.
4. Reminder of victory condition: “Align three of your emblems in a straight line.”
5. Rule reminder: “All actions must be enclosed in `\boxed{{}}` at the end of your message.”
**Few-shot examples:**
```
Example valid response:
I should take the center stone before my rival.
\boxed{{[Mark:1,1]}}
```
```
Example invalid response (wrong format):
\boxed{{Mark:1,1}} <-- Missing brackets [ ]
```
```
Example valid response (board full, passing):
No moves left, I will pass.
\boxed{{[Pass]}}
```
**Extraction Function Notice:**
`_extract_answer_content(self, action: str) -> str` will strip `\boxed{{}}` syntax and return internal content for validation.
---
## **10. API Mapping Plan**
| Method | Purpose | Operations on Game State | Output |
|--------|----------|--------------------------|--------|
| **`reset(seed)`** | Initialize the game | Sets all keys per schema, seed board, assign first player (`Sun`), populate `available_moves`, generate initial system message | Returns initial `observations` for both players |
| **`step(player_action)`** | Process one player's move | 1. Extract content with `_extract_answer_content` 2. Validate grammar & legality 3. If valid, apply to `board_state` 4. Append to `history` 5. Update `available_moves` 6. Check win/draw conditions, adjust scores, and advance turn | Returns updated `observations`, reward info, `done` flag |
| **`_generate_player_prompt(player_id)`** | Builds textual context for that player | Uses the current `board_state`, `turn_count`, and list of legal actions. Demonstrates correct formatting. | Returns formatted prompt string instructing the player to end with a `\boxed{{}}` action |
---
## **11. Copy-Check Against the Example**
This design is **fully distinct** from any negotiation or resource-trading environment.
- **Theme:** Archaeological puzzle arena (grid conquest), not negotiation.
- **Objectives:** Claim territory and form a line, not reach mutual agreements.
- **Entities:** Ancient stones, Sun and Moon symbols—not participants in a deal.
- **Game State Keys:** `board_state`, `player_symbols`, `available_moves`, and `scores`—entirely original.
- **Prompt Text:** References *Tic-Tac-Trail* and ancient exploration, not disputes or offers.
Therefore, this specification represents a fully self-contained original turn-based environment for a deterministic **tic-tac-toestyle strategy challenge**, compliant with TextArena architecture.
---