Tic-tac-toe with Q-learning

This is a tic-tac-toe game built using Q-learning, a reinforcement learning algorithm.

As a result of the training with 100,000 episodes where the agent played against a computer that made random moves, the agent won at around 75% of the games against the computer. The average reward was approximately 0.7, and the maximum reward exceeded 0.9.

About Q-learning

Q-learning algorithm:
    Q-learning is a model-free, value-based, off-policy reinforcement learning (RL) algorithm
    based on the Bellman equation. It uses a Q-table to store Q-values for state-action pairs,
    which represent the expected future rewards for taking specific actions in specific states.

Steps in Q-Learning Algorithm:
    1: Initialize the Q-values for all state-action pairs arbitrarily (often to zero).
    2: Observe the current state
    3: Select an action a based on the current policy (e.g., ε-greedy).
    4: Perform the action a and observe the reward r and the next state
    5: Update the Q-value using the Q-learning equation.
    6: Set the current state s to the next state
    7: Repeat steps 2-6 until the termination condition is met.

Q-table:
    The Q-table is defined as a list data type and is stored as a csv file with the following format.
    state, action, q_value
        "[[0, 0, 0], [0, 0, 0], [0, 0, 0]]", 1, 0.0

Q-value Update Equation:
    The Q-values are updated using the standard Q-learning update equation:
        Q_new(s, a) = Q(s, a) + alpha * (r + γ * max(Q(s', a')) - Q(s, a))
    Where:
        - Q(s, a): Current Q-value for the state-action pair (s, a)
        - alpha: Learning rate
        - γ: Discount factor (gamma)
        - max(Q(s', a')): Maximum Q-value for the next state s'
        - s: Current state
        - a: Action taken in the current state
        - r: Reward received after taking action a
    Reward values:
        - Win: +1
        - Tie: +0.5
        - Loss: -1

Reinforcement Learning (RL) Environment:
    - Action: Choose a move between 1 and 9.
    - State: Board configuration represented as a string, e.g., "[[0, 0, 0], [0, 0, 0], [0, 0, 0]]".
    - Reward:
        - Win: +1
        - Tie: +0.5
        - Loss: -1

Links:
    ・Introduce Q-learning algorithm
    https://towardsdatascience.com/an-ai-agent-learns-to-play-tic-tac-toe-part-3-training-a-q-learning-rl-agent-2871cef2faf0
    https://medium.com/@ardra4/tic-tac-toe-using-q-learning-a-reinforcement-learning-approach-d606cfdd64a3
    https://medium.com/@kaneel.senevirathne/teaching-agents-to-play-tic-tac-toe-using-reinforcement-learning-7a9d4d6ee9b3
    https://www.datacamp.com/tutorial/introduction-q-learning-beginner-tutorial?dc_referrer=https%3A%2F%2Fwww.google.com%2F
    https://towardsdatascience.com/reinforcement-learning-implement-tictactoe-189582bea542

Folder structure

├── src                     # codes for tic-tac-toe environment
│   ├── board.py            # for board
│   ├── game.py             # for game
│   ├── move.py             # for move
│   ├── player.py           # for player
│   └── rl.py               # for Q-learning algorithm
├── training                # codes for training
│   ├── training_results    # 
│   │    └── plan1          # training plan1 result files
│   ├── training.py         # for training
│   ├── result_analysis.py  # for analyzing the training result
│   ├── training_result.csv # training result file
│   └── q_table.csv         # Q-table file generated after 100,000 episodes
├── main.py                 # Run the app
├── .gitignore
├── requirements.txt
└── README.md

Preparation to use

Clone this project
git clone https://github.com/taka-rl/tic-tac-toe_q_learning.git
If you would like to only play Tic-Tac-Toe, please see "Play Tic-Tac-Toe".
If you would like to train the agent, Run the following command for the libraries:
On Windows type: python -m pip install -r requirements.txt
On MacOS type: pip3 install -r requirements.txt

Play Tic-Tac-Toe

If you would like to play tic-tac-toe simply, run main.py.
Choose a game mode between 1 and 6.

Training

Training preparation

You can set parameters for the training in config.py
When you are ready, then run training.py
When the training finished, the following messages show up
After the training, q_table.csv and training_result.csv files are saved in the training folder.
q_table.csv looks like this.

training_result.csv looks like this.

How to use result_analysis.py for result analysis

Make sure the result file path in result_analysis.py
Make sure the identifier value in CONFIGURATION defined in config.py
The identifier values must be matched with the result files.
Run result_analysis.py You will see the figures on the Result section.

Result

Plan 1: Same Parameter Settings with Different Numbers of Episodes

Training environment

During training, the agent plays against a computer that makes random moves.
Reward setting: win=1, tie=0.5, lose=-1
The parameter settings are as follows in config.py:

CONFIGURATIONS = [
    Config(learning_rate=0.1, discount_factor=0.9, epsilon=0.1, num_episodes=1000, identifier="training1_1"),
    Config(learning_rate=0.1, discount_factor=0.9, epsilon=0.1, num_episodes=10000, identifier="training1_2"),
    Config(learning_rate=0.1, discount_factor=0.9, epsilon=0.1, num_episodes=100000, identifier="training1_3"),
]

Expectation: As the number of episodes increases, the average reward is expected to increase.
Result: The average reward increased, and the number of wins also rose.

The average reward was calculated every 100 games. The number of episode is 1000.

The number of episode is 10000.

The number of episode is 100000.

Win/Lose/Tie:
Through the training, the number of win increased, the number of lose decreased gradually.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tic-tac-toe with Q-learning

About Q-learning

Folder structure

Preparation to use

Play Tic-Tac-Toe

Training

Training preparation

How to use result_analysis.py for result analysis

Result

Plan 1: Same Parameter Settings with Different Numbers of Episodes

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
src		src
training		training
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

taka-rl/tic-tac-toe_q_learning

Folders and files

Latest commit

History

Repository files navigation

Tic-tac-toe with Q-learning

About Q-learning

Folder structure

Preparation to use

Play Tic-Tac-Toe

Training

Training preparation

How to use result_analysis.py for result analysis

Result

Plan 1: Same Parameter Settings with Different Numbers of Episodes

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages