Skip to content

Om2005Prakash/Easy21

Repository files navigation

Easy21

This contains solution to assignment Easy21, in short Reinforcement Learning by David Silver https://www.davidsilver.uk/teaching/

How you can use this to generate your own results?
You can simply run all the cells of notebook in jupyter notebook. If you don't have jupyter notebook, I would recommend you to install it and open it there otherwise you can install jupyterplugin in VScode and run these. Just make sure you have the following packages installed.

1 Numpy
2 Matplotlib
3 tqdm

Results:-

Monte Carlo Control

MC V MC Pi

SarsaLam

SarsaLam sweep

SarsaLam Lam=0 SarsaLam Lam=0

SarsaLam V  Lam=0 SarsaLam Pi Lam=0 MSE

SarsaLam V  Lam=1 SarsaLam Pi Lam=1 MSE

Linear Function Approximator

LinApprox sweep

LinApprox Lam=0 LinApprox Lam=1

LinApprox V  Lam=0 LinApprox Pi Lam=0

LinApprox V  Lam=1 LinApprox Pi Lam=1

Answer to Question 5

Q What are the pros and cons of bootstrapping in Easy21?

Pros  
Much faster to learn (MC takes 10_00_000 ep to achieve performance of about 53% wins whereas
SarsaLam takes around 30_000 ep to achieve close to 51% and Func_App with SarsaLam takes
less than 100_000 ep to get accuracy close to 51%. In contrast, random agent gets an accuracy of about 45%.

Cons  
It varies wildly depending on the initial initialization of weights

Q Would you expect bootstrapping to help more in blackjack or Easy21 ? Why?

Blackjack. Blackjack MDP is much simpler than Easy21 as it lesser number of states (keeping track of
states less than 11 is worthless as we cannot lose on hitting on those states) also from a given state
there are lesser number of transitions in blackjack (in blackjack we can only increase our state but in 
Easy21 we can also decrease our state). Due to the simplicity of Blackjack, I think bootstrapping would be
more stable in blackjack.

Q What are the pros and cons of function approximation in Easy21?

Pros
Very Fast Learning(As it has a much lower number of features 36 as compared to 210)

Cons
Very sensitive to randomness of env as well as initialization
It learn different policy than SarsaLam and MC as it doesn't have fine control over states

Q How would you modify the function approximator suggested in this section to get better results in Easy21?

We can increase the states in coase coding that would help to represent more state

Using a method with decaying step size might improve the performance (when I tried this with hyperbolic decay
it just diverged wildly maybe we can use some other decay function)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published