diff --git a/README.md b/README.md index fd73f63..ba0bc4d 100644 --- a/README.md +++ b/README.md @@ -1,3 +1,47 @@ # TicTacToe-v0 -Environment created via Openverse \ No newline at end of file +### Overview + +TicTacToe is a classic two-player strategy game played on a 3x3 grid. The goal is to be the first player to align three of your marks, either `X` or `O`, horizontally, vertically, or diagonally. This simple yet elegant game tests players’ ability to anticipate, block, and plan moves ahead, making it a suitable environment for evaluating reasoning, prediction, and opponent modeling in large language models (LLMs). + +--- + +### Gameplay + +* **Players:** 2 +* **Symbols:** `X` and `O` +* **Objective:** Form a line of three of your symbols before your opponent. +* **Board Layout:** + + ``` + 0 | 1 | 2 + ---+---+--- + 3 | 4 | 5 + ---+---+--- + 6 | 7 | 8 + ``` + +Players take turns selecting a cell by its index (0–8). The environment automatically validates moves and announces wins, losses, or draws. + +--- + +### Environment Details + +* **Environment Name:** `TicTacToe-v0` +* **Number of Players:** 2 +* **Observation Type:** Text-based description of board state and game messages +* **Action Type:** Integer index (0–8) +* **Winning Condition:** Three identical symbols in a row, column, or diagonal +* **Termination:** When a player wins or all cells are filled (draw) + +--- + +### LLM Evaluation Purpose + +TicTacToe serves as a benchmark for: + +* **Strategic reasoning:** planning moves and anticipating outcomes +* **Opponent modeling:** predicting and countering adversarial play +* **Deterministic decision-making:** consistent performance under clear rules + +It is also a good starting environment for reinforcement learning or self-play fine-tuning of small or large language models.