Update README.md
This commit is contained in:
46
README.md
46
README.md
@@ -1,3 +1,47 @@
|
|||||||
# TicTacToe-v0
|
# TicTacToe-v0
|
||||||
|
|
||||||
Environment created via Openverse
|
### Overview
|
||||||
|
|
||||||
|
TicTacToe is a classic two-player strategy game played on a 3x3 grid. The goal is to be the first player to align three of your marks, either `X` or `O`, horizontally, vertically, or diagonally. This simple yet elegant game tests players’ ability to anticipate, block, and plan moves ahead, making it a suitable environment for evaluating reasoning, prediction, and opponent modeling in large language models (LLMs).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Gameplay
|
||||||
|
|
||||||
|
* **Players:** 2
|
||||||
|
* **Symbols:** `X` and `O`
|
||||||
|
* **Objective:** Form a line of three of your symbols before your opponent.
|
||||||
|
* **Board Layout:**
|
||||||
|
|
||||||
|
```
|
||||||
|
0 | 1 | 2
|
||||||
|
---+---+---
|
||||||
|
3 | 4 | 5
|
||||||
|
---+---+---
|
||||||
|
6 | 7 | 8
|
||||||
|
```
|
||||||
|
|
||||||
|
Players take turns selecting a cell by its index (0–8). The environment automatically validates moves and announces wins, losses, or draws.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Environment Details
|
||||||
|
|
||||||
|
* **Environment Name:** `TicTacToe-v0`
|
||||||
|
* **Number of Players:** 2
|
||||||
|
* **Observation Type:** Text-based description of board state and game messages
|
||||||
|
* **Action Type:** Integer index (0–8)
|
||||||
|
* **Winning Condition:** Three identical symbols in a row, column, or diagonal
|
||||||
|
* **Termination:** When a player wins or all cells are filled (draw)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### LLM Evaluation Purpose
|
||||||
|
|
||||||
|
TicTacToe serves as a benchmark for:
|
||||||
|
|
||||||
|
* **Strategic reasoning:** planning moves and anticipating outcomes
|
||||||
|
* **Opponent modeling:** predicting and countering adversarial play
|
||||||
|
* **Deterministic decision-making:** consistent performance under clear rules
|
||||||
|
|
||||||
|
It is also a good starting environment for reinforcement learning or self-play fine-tuning of small or large language models.
|
||||||
|
|||||||
Reference in New Issue
Block a user