Skip to content

Latest commit

 

History

History
195 lines (131 loc) · 6.46 KB

README.md

File metadata and controls

195 lines (131 loc) · 6.46 KB

Reinforcement Learning 2048

An AI BOT playing game 2048 by using reinforcement learning

Overview

Demo

The Elements of 2048 Reinforcement Learning problem

  • Objective: Get the highest score / max tile. i.e. Live as long as it can while maintaining good board state.
  • State: An 4x4 grid with numbers of tiles in value of power of 2.
  • Action: Shift board UP, DOWN, LEFT, RIGHT
  • Reward: Increment of score or score with other metrics.

Usage

Dependencies

  • tensorflow
  • numpy
  • pyyaml

Basic Game Play

$ python3 RL2048/Game/Play.py
Play mode:
1. Keyboard (use w, a, s, d, exit with ^C or ^D)
2. Random

 select:
  • Keyboard mode
  • Random mode

Training model

$ python3 RL2048/Learning/backward.py
  • TRAIN_MODE.NORMAL: Normal training process
    • Use only NN itself
  • TRAIN_MODE.WITH_RANDOM
    • With a little chance to move randomly

Statistics Report

$ python3 RL2048/Report/Statistics.py
  • Success Rate of Tiles
  • Scores Diagram
  • Loss Diagram (TODO)

Default file locations

  • Model (ckpt): ./model
  • Last game status: training_game.yaml
  • Training log: training.log
  • Statistics report: ./report/StatisticsResult.md

If you have trouble that can't find RL2048 module. (ModuleNotFoundError: No module named 'RL2048')

You sould make sure your workspace is in the main directory of this project. Then execute code like this.

export PYTHONPATH=$PYTHONPATH:/path/to/this/project/ReinforcementLearning2048; python3 RL2048/Learning/backward.py

Or add the following lines to every top of the codes.

import sys
sys.path.append('/path/to/this/project/ReinforcementLearning2048')

Policy Gradient

Heuristic: Artificial Intelligence: How many artifact, how many intelligence!

Epsilon Decay

With a decaly probability that take control by "Teacher".

Random

Traditonal Tree-search algorithm

The Monte Carlo tree search algorithm

  • Monotonicity
  • Smoothness
  • Free Tiles
  • Z-shape

(Minimax search with alpha-beta pruning)

Result of Policy Gradient

We found that Policy Gradient is not a good approach for 2048.

The main point is 2048 has a "local confort zone". That sometimes you need to take a negative action to move since the direction that you desired is invalid.

Problems

  • Network is too stupid that it keep taking invalid aciton. = =
  • Loss become too small and it seems that Network learned nothing in the first 100 round. -> Too small problem solved. But still learned nothing.

MCTS Policy Gradient

MCTS Policy Gradient

Random Policy Gradient

Random Policy Gradient

idea:

  • Use Random build a history and use DQN to observe the pattern.
  • Use MCTS build a experience history, then teach DQN how to play directly.

Deep Q-Learning (DQN)

Improvement/Adjustment

  1. Grid preprocessing
    • one-hot
  2. Feed a batch of status-action pair
  3. Loss function
  4. Q-Learning gamma
  5. Experience

Notes

Links

Similar Project

Use Machine Learning

Use Traditional AI

Simple Game Play

Article and Paper

AlphaGo

Others