Update README.md

2025-11-08 16:29:43 +00:00
parent e2f463b508
commit 84d498c675
1 changed files with 45 additions and 1 deletions
--- a/README.md
+++ b/README.md
@@ -1,3 +1,47 @@
 # TicTacToe-v0

-Environment created via Openverse
+### Overview
+
+TicTacToe is a classic two-player strategy game played on a 3x3 grid. The goal is to be the first player to align three of your marks, either `X` or `O`, horizontally, vertically, or diagonally. This simple yet elegant game tests players’ ability to anticipate, block, and plan moves ahead, making it a suitable environment for evaluating reasoning, prediction, and opponent modeling in large language models (LLMs).
+
+---
+
+### Gameplay
+
+* **Players:** 2
+* **Symbols:** `X` and `O`
+* **Objective:** Form a line of three of your symbols before your opponent.
+* **Board Layout:**
+
+  ```
+   0 | 1 | 2
+  ---+---+---
+   3 | 4 | 5
+  ---+---+---
+   6 | 7 | 8
+  ```
+
+Players take turns selecting a cell by its index (0–8). The environment automatically validates moves and announces wins, losses, or draws.
+
+---
+
+### Environment Details
+
+* **Environment Name:** `TicTacToe-v0`
+* **Number of Players:** 2
+* **Observation Type:** Text-based description of board state and game messages
+* **Action Type:** Integer index (0–8)
+* **Winning Condition:** Three identical symbols in a row, column, or diagonal
+* **Termination:** When a player wins or all cells are filled (draw)
+
+---
+
+### LLM Evaluation Purpose
+
+TicTacToe serves as a benchmark for:
+
+* **Strategic reasoning:** planning moves and anticipating outcomes
+* **Opponent modeling:** predicting and countering adversarial play
+* **Deterministic decision-making:** consistent performance under clear rules
+
+It is also a good starting environment for reinforcement learning or self-play fine-tuning of small or large language models.