Files
TicTacToe-v0/README.md
2025-11-08 16:29:43 +00:00

1.6 KiB
Raw Blame History

TicTacToe-v0

Overview

TicTacToe is a classic two-player strategy game played on a 3x3 grid. The goal is to be the first player to align three of your marks, either X or O, horizontally, vertically, or diagonally. This simple yet elegant game tests players ability to anticipate, block, and plan moves ahead, making it a suitable environment for evaluating reasoning, prediction, and opponent modeling in large language models (LLMs).


Gameplay

  • Players: 2

  • Symbols: X and O

  • Objective: Form a line of three of your symbols before your opponent.

  • Board Layout:

     0 | 1 | 2
    ---+---+---
     3 | 4 | 5
    ---+---+---
     6 | 7 | 8
    

Players take turns selecting a cell by its index (08). The environment automatically validates moves and announces wins, losses, or draws.


Environment Details

  • Environment Name: TicTacToe-v0
  • Number of Players: 2
  • Observation Type: Text-based description of board state and game messages
  • Action Type: Integer index (08)
  • Winning Condition: Three identical symbols in a row, column, or diagonal
  • Termination: When a player wins or all cells are filled (draw)

LLM Evaluation Purpose

TicTacToe serves as a benchmark for:

  • Strategic reasoning: planning moves and anticipating outcomes
  • Opponent modeling: predicting and countering adversarial play
  • Deterministic decision-making: consistent performance under clear rules

It is also a good starting environment for reinforcement learning or self-play fine-tuning of small or large language models.