Reinforcement Learning

This repo contains a set of games and my corresponding attempts at creating Reinforcement Learning Agents capable of beating these games.

Current games

2048 - "a single-player sliding tile puzzle video game written by Italian web developer Gabriele Cirulli and published on GitHub."

2048

For now, this is the only game present in the repo. It is present as a playable online game under folders api/ and frontend/ and as a set of functions to train an agent on under game/.

How to play the game locally

To play locally, first clone the repo and create a local environment (e.g. using conda) with Python 3.9.11 installed.

git clone https://github.com/andreaalf97/reinforcement-learning
conda create -n <env-name> python==3.9.11
conda activate <env-name>

Install the required packages.

pip install -r playing_requirements.txt

Start the API where the game is hosted, which will live on port 5000.

python api/app.py

Start the frontend server from the frontend/ folder.

cd frontend
python -m http.server 1234

Navigate to the game URL at http://localhost:1234/

How to train the RL Agent

The training strategy is based on Deep Q Networks, an adaptation of Q Learning for high state-action dimensionalities. In particular, this is an adaptation of this and this articles.

Installation

First, clone the repo and create a local environment (e.g. using conda) with Python 3.9.11 installed.

git clone https://github.com/andreaalf97/reinforcement-learning
conda create -n <env-name> python==3.9.11
conda activate <env-name>

Install the requirements.

pip install -r training_requirements.txt

Start training

python main.py

At the end of training, you will be prompted with two images showing the training loss convergence and the average total reward achieved over the episodes. The model parameters, loss, total loss and a copy of the final trained model will be stored under model_checkpoints/ or wherever indicated by parameter --store-run-at.

State representation

The game state is represented by a one dimension array of size 16. E.g. the following game state:


2	4	4
0	0	0
4	4	0
16	16	8

is represented by the following array: [2, 4, 4, 0, 0, 0, 0, 0, 4, 4, 0, 0, 16, 16, 8, 0].

Possible actions

The available actions are right, left, up, down.

Reward scheme

Merging two tiles with the same number has as reward the amount of the number generated. If multiple tiles are merged in one move, the total reward is the sum of the single rewards. E.g. the reward for performing actionleft on the state above is 48 = 32 + 8 + 8.

Brain

The architecture used to learn the game is composed by two linear layers with a ReLu activation function in between and hidden dimension given by argument --hidden-size.

Default parameters

For a full view on the parameters, run python main.py --help.

Parameter	Default value	Description
`--update-target-network-every`	1000	After how many game steps the main model weights are copied onto the target model
`--update-main-network-every`	16	After how many game steps the main target is trained
`--episodes`	350	How many game episodes will be simulated to collect training samples
`--max-moves-per-episode`	400	How many moves are allowed per episode
`--hidden-size`	32	The size of model's hidden dimension
`--random-seed`	0	The seed for all random number generators, for reproducibility purposes
`--n-samples-to-train-on`	1000	How many (randomly picked from the replay memory) samples the model is trained on at each training step
`--mini-batch-size`	32	The mini-batch size used for training
`--min-replay-size`	1000	The minimum size of the replay-memory before training can start
`--epochs`	1	How many times training will go over the same randomly picked sample of replay-memory
`--learning-rate`	0.7	The learning rate for the Bellman equation: `new_q = (1-lr)(current_q) + (lr)(new_q)`
`--discount-factor`	0.618	The discount factor for the Bellman equation: `new_q = reward + DF * max(Q

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
api		api
frontend		frontend
game		game
.gitignore		.gitignore
README.md		README.md
analytics.ipynb		analytics.ipynb
main.py		main.py
playing_requirements.txt		playing_requirements.txt
setup.py		setup.py
training_requirements.txt		training_requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reinforcement Learning

Current games

2048

How to play the game locally

How to train the RL Agent

Installation

State representation

Possible actions

Reward scheme

Brain

Default parameters

About

Releases

Packages

Languages

andreaalf97/reinforcement-learning

Folders and files

Latest commit

History

Repository files navigation

Reinforcement Learning

Current games

2048

How to play the game locally

How to train the RL Agent

Installation

State representation

Possible actions

Reward scheme

Brain

Default parameters

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages