Add environment documentation from Openverse builder
This commit is contained in:
223
environment.md
Normal file
223
environment.md
Normal file
@@ -0,0 +1,223 @@
|
||||
---
|
||||
|
||||
# **TIC-TAC-TRAIL: A Turn-Based Strategy Design Document**
|
||||
|
||||
---
|
||||
|
||||
## **1. Concept Paragraph**
|
||||
|
||||
**Concept Overview:**
|
||||
*Tic-Tac-Trail* is a deterministic, turn-based tactical puzzle inspired by grid conquest—completely unrelated to negotiation or trade mechanics. Two explorers, **Team Sun** and **Team Moon**, compete to claim paths on an ancient 3×3 stone map. Each tile can be marked with their emblem (`Sun` or `Moon`). The first expedition to align three of their emblems in a continuous line (horizontal, vertical, or diagonal) awakens the temple’s power and wins. Core player commands are expressed as `[Mark:<row>,<col>]`, describing which grid position to claim, or `[Pass]` if no legal move remains. The environment tracks placement, board state, turn order, and victory conditions deterministically.
|
||||
|
||||
---
|
||||
|
||||
## **2. Roles and Win Condition**
|
||||
|
||||
- **Players:**
|
||||
- Player 1: *Team Sun* (symbol “S”)
|
||||
- Player 2: *Team Moon* (symbol “M”)
|
||||
|
||||
- **Objective:**
|
||||
Align three of one’s symbols (`S` or `M`) in a straight line (row, column, or diagonal) before the board fills.
|
||||
|
||||
- **Decision Rules:**
|
||||
- **Win:** First player to form an unbroken trio of their own emblem.
|
||||
- **Loss:** Opponent achieves a trio first.
|
||||
- **Draw:** All nine tiles filled without a winning alignment.
|
||||
- Once a win or draw occurs, the game becomes terminal and no further moves are accepted.
|
||||
|
||||
---
|
||||
|
||||
## **3. Turn Structure and Determinism**
|
||||
|
||||
- The game alternates turns strictly: *Sun → Moon → Sun → Moon*, and so on.
|
||||
- Turn count begins at 1 and increments after each valid action.
|
||||
- Maximum of nine turns (since there are nine cells).
|
||||
- No random factors exist; the game is **fully deterministic**.
|
||||
- Seed value (for reproducibility) is still stored in state, but unused—ensuring consistent replay.
|
||||
|
||||
---
|
||||
|
||||
## **4. Action Grammar (Machine-Parseable)**
|
||||
|
||||
**Permitted Actions:**
|
||||
|
||||
### 4.1 Mark a Tile
|
||||
- **Token Format:** `[Mark:<row>,<col>]`
|
||||
- **Pattern (regex):** `^\[Mark:(0|1|2),(0|1|2)\]$`
|
||||
- **Semantics:** Current player places their symbol on the specified cell `(row, col)` if it’s empty.
|
||||
|
||||
**Examples:**
|
||||
- ✅ **Valid:** `[Mark:0,2]` — Player marks top-right cell.
|
||||
- ❌ **Invalid:** `[Mark:3,1]` — Row "3" out of range (valid rows: 0–2).
|
||||
- ❌ **Invalid:** `[Mark:1-2]` — Comma separator or keyword missing.
|
||||
|
||||
### 4.2 Pass
|
||||
- **Token Format:** `[Pass]`
|
||||
- **Pattern (regex):** `^\[Pass\]$`
|
||||
- **Semantics:** Used only if the player has no valid cell remaining (rare in tic-tac-toe).
|
||||
|
||||
**Examples:**
|
||||
- ✅ **Valid:** `[Pass]` — Player skips turn.
|
||||
- ❌ **Invalid:** `[PASS]` — Case-sensitive token must match exactly `[Pass]`.
|
||||
|
||||
---
|
||||
|
||||
## **5. Game State Schema**
|
||||
|
||||
```json
|
||||
{
|
||||
"seed": 42,
|
||||
"turn_count": 1,
|
||||
"current_player": "Sun",
|
||||
"board_state": [
|
||||
["_", "_", "_"],
|
||||
["_", "_", "_"],
|
||||
["_", "_", "_"]
|
||||
],
|
||||
"player_symbols": {
|
||||
"Sun": "S",
|
||||
"Moon": "M"
|
||||
},
|
||||
"history": [
|
||||
{"player": "System", "message": "The ancient board awaits."}
|
||||
],
|
||||
"winner": null,
|
||||
"status": "ongoing",
|
||||
"available_moves": [
|
||||
[0, 0], [0, 1], [0, 2],
|
||||
[1, 0], [1, 1], [1, 2],
|
||||
[2, 0], [2, 1], [2, 2]
|
||||
],
|
||||
"scores": {
|
||||
"Sun": 0,
|
||||
"Moon": 0
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
- Keys reflect a unique thematic world: the ancient “trail” board, emblems for “Sun” and “Moon,” and clear distinction from any negotiation-like schema.
|
||||
|
||||
---
|
||||
|
||||
## **6. Initialization Rules**
|
||||
|
||||
- When `reset(seed)` is called:
|
||||
1. The RNG is seeded (though unused for determinism) using `seed`.
|
||||
2. The `board_state` is filled with `_` symbols representing empty stone tiles.
|
||||
3. The first turn is always `Sun`.
|
||||
4. The `history` log begins with a world description.
|
||||
5. `available_moves` includes all `(row, col)` pairs.
|
||||
- Observation: Both players receive identical initial description and empty board visualization.
|
||||
|
||||
---
|
||||
|
||||
## **7. Validation and Error Handling**
|
||||
|
||||
- **Extraction:**
|
||||
The environment will extract content from within `\boxed{{}}` using `_extract_answer_content(action)`.
|
||||
|
||||
- **Validation Steps:**
|
||||
1. Verify action string matches one of the two regex patterns.
|
||||
2. If `[Mark:<r>,<c>]`, check:
|
||||
- 0 ≤ r,c ≤ 2
|
||||
- Corresponding cell is unoccupied (`"_"`).
|
||||
3. If `[Pass]`, ensure no playable cells remain; otherwise invalid.
|
||||
|
||||
- **Invalid Reasons (examples):**
|
||||
- "Invalid format — must be [Mark:r,c] or [Pass]."
|
||||
- "Chosen cell already occupied."
|
||||
- "Row or column index out of range."
|
||||
- "Cannot pass while moves still available."
|
||||
|
||||
If invalid, the system invokes `set_invalid_move(reason)` and forfeit logic may apply depending on higher-level controller.
|
||||
|
||||
---
|
||||
|
||||
## **8. Terminal Conditions and Scoring**
|
||||
|
||||
**Checks performed after each valid move:**
|
||||
|
||||
1. **Win Check:**
|
||||
If the current player owns three symbols aligned horizontally, vertically, or diagonally:
|
||||
- `winner = current_player`
|
||||
- `status = "finished"`
|
||||
- `scores[current_player] = 1`
|
||||
- Opponent receives 0.
|
||||
|
||||
2. **Draw Check:**
|
||||
If all cells filled and no winner:
|
||||
- `winner = null`
|
||||
- `status = "draw"`
|
||||
- Both scores = 0.5.
|
||||
|
||||
3. **Continue Otherwise:**
|
||||
- `status = "ongoing"`
|
||||
- Proceed to next player.
|
||||
|
||||
**Tie-Break:**
|
||||
None beyond declared draw; equal scoring applies.
|
||||
|
||||
---
|
||||
|
||||
## **9. Player Prompt Specification**
|
||||
|
||||
**Prompt Identity and Instructions:**
|
||||
|
||||
Each turn’s prompt should contain:
|
||||
|
||||
1. A brief world intro:
|
||||
“You are an explorer representing Team Sun (or Team Moon) claiming tiles on the ancient Tic-Tac-Trail.”
|
||||
2. The current board visualization (3×3 grid of `_`, `S`, `M`).
|
||||
3. The list of allowed action formats:
|
||||
- `[Mark:<row>,<col>]` where `<row>` and `<col>` are integers 0–2.
|
||||
- `[Pass]` if no unclaimed tiles remain.
|
||||
4. Reminder of victory condition: “Align three of your emblems in a straight line.”
|
||||
5. Rule reminder: “All actions must be enclosed in `\boxed{{}}` at the end of your message.”
|
||||
|
||||
**Few-shot examples:**
|
||||
|
||||
```
|
||||
Example valid response:
|
||||
I should take the center stone before my rival.
|
||||
\boxed{{[Mark:1,1]}}
|
||||
```
|
||||
|
||||
```
|
||||
Example invalid response (wrong format):
|
||||
\boxed{{Mark:1,1}} <-- Missing brackets [ ]
|
||||
```
|
||||
|
||||
```
|
||||
Example valid response (board full, passing):
|
||||
No moves left, I will pass.
|
||||
\boxed{{[Pass]}}
|
||||
```
|
||||
|
||||
**Extraction Function Notice:**
|
||||
`_extract_answer_content(self, action: str) -> str` will strip `\boxed{{}}` syntax and return internal content for validation.
|
||||
|
||||
---
|
||||
|
||||
## **10. API Mapping Plan**
|
||||
|
||||
| Method | Purpose | Operations on Game State | Output |
|
||||
|--------|----------|--------------------------|--------|
|
||||
| **`reset(seed)`** | Initialize the game | Sets all keys per schema, seed board, assign first player (`Sun`), populate `available_moves`, generate initial system message | Returns initial `observations` for both players |
|
||||
| **`step(player_action)`** | Process one player's move | 1. Extract content with `_extract_answer_content` 2. Validate grammar & legality 3. If valid, apply to `board_state` 4. Append to `history` 5. Update `available_moves` 6. Check win/draw conditions, adjust scores, and advance turn | Returns updated `observations`, reward info, `done` flag |
|
||||
| **`_generate_player_prompt(player_id)`** | Builds textual context for that player | Uses the current `board_state`, `turn_count`, and list of legal actions. Demonstrates correct formatting. | Returns formatted prompt string instructing the player to end with a `\boxed{{}}` action |
|
||||
|
||||
---
|
||||
|
||||
## **11. Copy-Check Against the Example**
|
||||
|
||||
This design is **fully distinct** from any negotiation or resource-trading environment.
|
||||
- **Theme:** Archaeological puzzle arena (grid conquest), not negotiation.
|
||||
- **Objectives:** Claim territory and form a line, not reach mutual agreements.
|
||||
- **Entities:** Ancient stones, Sun and Moon symbols—not participants in a deal.
|
||||
- **Game State Keys:** `board_state`, `player_symbols`, `available_moves`, and `scores`—entirely original.
|
||||
- **Prompt Text:** References *Tic-Tac-Trail* and ancient exploration, not disputes or offers.
|
||||
|
||||
Therefore, this specification represents a fully self-contained original turn-based environment for a deterministic **tic-tac-toe–style strategy challenge**, compliant with TextArena architecture.
|
||||
|
||||
---
|
||||
Reference in New Issue
Block a user