LunarLander

Testing the differences between each type of machine learning algorithm to their strengths and weaknesses.

Deep Q-Network (DQN)

python3 dqn.py

The deep q-network actually surprised me quite a bit. I thought that it would take longer and more episodes due to not having the noise 'rejection' that a double deep q- network has.

Double Deep Q-Network (DDQN)

python3 ddqn.py

The deep q-network performed as expected, but took more episodes than I was expecting.

Off Policy Comparison

Surprisingly the Deep Q-Network was more stable and trained faster than the 'new and improved' Double Deep Q-Network developed by DeepMind. It is possible that specifically in LunarLander-v2 the DQN doesn't suffer from overestimations that the DDQN was developed to solve.

REINFORCE

python3 reinforce.py

The reinforce algorithm was stable, but looked to have be very slow at achieving any greater performance than it did. REINFORCE also surprised me at how noisy it was during training. If there wasn't an averaging curve I wouldn't have been able to tell it was getting improving.

Advantage Actor-Critic (A2C)

python3 a2c.py

The advantage actor-critic model worked quite well. It appeared to have some difficulties in the middle of training but it eventually recovered. Had I let it train longer I expect it could have reached a significantly higher score.

On Policy Training

When comparing the two I expect that the A2C model had a higher ceiling than the REINFORCE model. It makes sense that the A2C took longer to train because it had to train both an actor and a critic while the REINFORCE model only had to train a single model. It also had a larger network to train than the REINFORCE model.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
LunarLander_A2C		LunarLander_A2C
LunarLander_DQN		LunarLander_DQN
LunarLander_DoubleDQN		LunarLander_DoubleDQN
LunarLander_Reinforce		LunarLander_Reinforce
.gitignore		.gitignore
LICENSE		LICENSE
LunarLander_A2C.csv		LunarLander_A2C.csv
LunarLander_A2C.png		LunarLander_A2C.png
LunarLander_A2C_actor.h5		LunarLander_A2C_actor.h5
LunarLander_A2C_critic.h5		LunarLander_A2C_critic.h5
LunarLander_A2C_time.csv		LunarLander_A2C_time.csv
LunarLander_DQN.csv		LunarLander_DQN.csv
LunarLander_DQN.h5		LunarLander_DQN.h5
LunarLander_DQN.png		LunarLander_DQN.png
LunarLander_DQN_time.csv		LunarLander_DQN_time.csv
LunarLander_DoubleDQN.csv		LunarLander_DoubleDQN.csv
LunarLander_DoubleDQN.h5		LunarLander_DoubleDQN.h5
LunarLander_DoubleDQN.png		LunarLander_DoubleDQN.png
LunarLander_DoubleDQN_time.csv		LunarLander_DoubleDQN_time.csv
LunarLander_Reinforce.csv		LunarLander_Reinforce.csv
LunarLander_Reinforce.h5		LunarLander_Reinforce.h5
LunarLander_Reinforce.png		LunarLander_Reinforce.png
LunarLander_Reinforce_time.csv		LunarLander_Reinforce_time.csv
README.md		README.md
a2c.py		a2c.py
analyze.py		analyze.py
ddqn.py		ddqn.py
dqn.py		dqn.py
off_policy_analysis.png		off_policy_analysis.png
on_policy_analysis.png		on_policy_analysis.png
reinforce.py		reinforce.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LunarLander

Deep Q-Network (DQN)

Double Deep Q-Network (DDQN)

Off Policy Comparison

REINFORCE

Advantage Actor-Critic (A2C)

On Policy Training

About

Releases

Packages

Languages

License

Rampagy/LunarLander

Folders and files

Latest commit

History

Repository files navigation

LunarLander

Deep Q-Network (DQN)

Double Deep Q-Network (DDQN)

Off Policy Comparison

REINFORCE

Advantage Actor-Critic (A2C)

On Policy Training

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages