Reinforcement Learning 2048

An AI BOT playing game 2048 by using reinforcement learning

Overview

Demo

The Elements of 2048 Reinforcement Learning problem

Objective: Get the highest score / max tile. i.e. Live as long as it can while maintaining good board state.

State: An 4x4 grid with numbers of tiles in value of power of 2.
Action: Shift board UP, DOWN, LEFT, RIGHT
Reward: Increment of score or score with other metrics.

Usage

Dependencies

tensorflow
numpy
pyyaml

Basic Game Play

$ python3 RL2048/Game/Play.py
Play mode:
1. Keyboard (use w, a, s, d, exit with ^C or ^D)
2. Random

 select:

Keyboard mode
Random mode

Training model

$ python3 RL2048/Learning/backward.py

TRAIN_MODE.NORMAL: Normal training process
- Use only NN itself
TRAIN_MODE.WITH_RANDOM
- With a little chance to move randomly

Statistics Report

$ python3 RL2048/Report/Statistics.py

Success Rate of Tiles
Scores Diagram
Loss Diagram (TODO)

Default file locations

Model (ckpt): ./model
Last game status: training_game.yaml
Training log: training.log
Statistics report: ./report/StatisticsResult.md

If you have trouble that can't find RL2048 module. (ModuleNotFoundError: No module named 'RL2048')

You sould make sure your workspace is in the main directory of this project. Then execute code like this.

export PYTHONPATH=$PYTHONPATH:/path/to/this/project/ReinforcementLearning2048; python3 RL2048/Learning/backward.py

Or add the following lines to every top of the codes.

import sys
sys.path.append('/path/to/this/project/ReinforcementLearning2048')

Policy Gradient

Heuristic: Artificial Intelligence: How many artifact, how many intelligence!

Epsilon Decay

With a decaly probability that take control by "Teacher".

Random

Traditonal Tree-search algorithm

The Monte Carlo tree search algorithm

Monotonicity
Smoothness
Free Tiles
Z-shape

(Minimax search with alpha-beta pruning)

Result of Policy Gradient

We found that Policy Gradient is not a good approach for 2048.

The main point is 2048 has a "local confort zone". That sometimes you need to take a negative action to move since the direction that you desired is invalid.

Problems

Network is too stupid that it keep taking invalid aciton. = =
Loss become too small and it seems that Network learned nothing in the first 100 round. -> Too small problem solved. But still learned nothing.

MCTS Policy Gradient

Random Policy Gradient

idea:

Use Random build a history and use DQN to observe the pattern.
Use MCTS build a experience history, then teach DQN how to play directly.

Deep Q-Learning (DQN)

Improvement/Adjustment

Grid preprocessing
- one-hot
Feed a batch of status-action pair
Loss function
Q-Learning gamma
Experience

Notes

Reinforcement Learning Notes
There is a more elegant way to store a class object in yaml format by defining it as a subclass of yaml.YAMLObject. (PyYAML Documentation - Constructors, representers, resolvers section)

Links

Similar Project

Use Machine Learning

tjwei/2048-NN - Max tile 16384, 94% win rate
georgwiese/2048-rl
- slides
nneonneo/2048-ai
SergioIommi/DQN-2048 - with Keras
navjindervirdee/2048-deep-reinforcement-learning - Max tile 4096, 10% win rate

Use Traditional AI

daviddwlee84/2048-AI-BOT - This was me and my friend Tom attending AI competition in 2014.
ovolve/2048-AI - 90% win rate
- demo
jdleesmiller/twenty48
- Blog

Simple Game Play

Python
- yangshun/2048-python
- luliyucoordinate/Python2048
JavaScript
- gabrielecirulli/2048 - almost 10k stars
  - demo
- GetMIT

Article and Paper

Stackoverflow - What is the optimal algorithm for the game 2048?
MIT - Deep Reinforcement Learning for 2048
Reddit - TDL, N-Tuple Network - 97% win rate
- paper
- demo
Stanford - AI Plays 2048

AlphaGo

DeepMind - AlphaGo Zero: Learning from scratch
AlphaGo Zero Explained In One Diagram
How to build your own AlphaZero AI using Python and Keras

Others

Key listener in Python

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Reinforcement Learning 2048

Overview

Demo

The Elements of 2048 Reinforcement Learning problem

Usage

Basic Game Play

Training model

Statistics Report

Default file locations

Policy Gradient

Epsilon Decay

Random

Traditonal Tree-search algorithm

Result of Policy Gradient

Problems

Deep Q-Learning (DQN)

Notes

Links

Similar Project

Article and Paper

Others

Files

README.md

Latest commit

History

README.md

File metadata and controls

Reinforcement Learning 2048

Overview

Demo

The Elements of 2048 Reinforcement Learning problem

Usage

Basic Game Play

Training model

Statistics Report

Default file locations

Policy Gradient

Epsilon Decay

Random

Traditonal Tree-search algorithm

Result of Policy Gradient

Problems

Deep Q-Learning (DQN)

Notes

Links

Similar Project

Article and Paper

Others