Files
TicTacToe-v0/README.md
2025-11-08 16:29:43 +00:00

48 lines
1.6 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# TicTacToe-v0
### Overview
TicTacToe is a classic two-player strategy game played on a 3x3 grid. The goal is to be the first player to align three of your marks, either `X` or `O`, horizontally, vertically, or diagonally. This simple yet elegant game tests players ability to anticipate, block, and plan moves ahead, making it a suitable environment for evaluating reasoning, prediction, and opponent modeling in large language models (LLMs).
---
### Gameplay
* **Players:** 2
* **Symbols:** `X` and `O`
* **Objective:** Form a line of three of your symbols before your opponent.
* **Board Layout:**
```
0 | 1 | 2
---+---+---
3 | 4 | 5
---+---+---
6 | 7 | 8
```
Players take turns selecting a cell by its index (08). The environment automatically validates moves and announces wins, losses, or draws.
---
### Environment Details
* **Environment Name:** `TicTacToe-v0`
* **Number of Players:** 2
* **Observation Type:** Text-based description of board state and game messages
* **Action Type:** Integer index (08)
* **Winning Condition:** Three identical symbols in a row, column, or diagonal
* **Termination:** When a player wins or all cells are filled (draw)
---
### LLM Evaluation Purpose
TicTacToe serves as a benchmark for:
* **Strategic reasoning:** planning moves and anticipating outcomes
* **Opponent modeling:** predicting and countering adversarial play
* **Deterministic decision-making:** consistent performance under clear rules
It is also a good starting environment for reinforcement learning or self-play fine-tuning of small or large language models.