1.6 KiB
TicTacToe-v0
Overview
TicTacToe is a classic two-player strategy game played on a 3x3 grid. The goal is to be the first player to align three of your marks, either X or O, horizontally, vertically, or diagonally. This simple yet elegant game tests players’ ability to anticipate, block, and plan moves ahead, making it a suitable environment for evaluating reasoning, prediction, and opponent modeling in large language models (LLMs).
Gameplay
-
Players: 2
-
Symbols:
XandO -
Objective: Form a line of three of your symbols before your opponent.
-
Board Layout:
0 | 1 | 2 ---+---+--- 3 | 4 | 5 ---+---+--- 6 | 7 | 8
Players take turns selecting a cell by its index (0–8). The environment automatically validates moves and announces wins, losses, or draws.
Environment Details
- Environment Name:
TicTacToe-v0 - Number of Players: 2
- Observation Type: Text-based description of board state and game messages
- Action Type: Integer index (0–8)
- Winning Condition: Three identical symbols in a row, column, or diagonal
- Termination: When a player wins or all cells are filled (draw)
LLM Evaluation Purpose
TicTacToe serves as a benchmark for:
- Strategic reasoning: planning moves and anticipating outcomes
- Opponent modeling: predicting and countering adversarial play
- Deterministic decision-making: consistent performance under clear rules
It is also a good starting environment for reinforcement learning or self-play fine-tuning of small or large language models.