This contains solution to assignment Easy21, in short Reinforcement Learning by David Silver https://www.davidsilver.uk/teaching/
How you can use this to generate your own results?
You can simply run all the cells of notebook in jupyter notebook. If you don't have jupyter notebook, I would recommend
you to install it and open it there otherwise you can install jupyterplugin in VScode and run these. Just make sure you have
the following packages installed.
1 Numpy
2 Matplotlib
3 tqdm
Q What are the pros and cons of bootstrapping in Easy21?
Pros
Much faster to learn (MC takes 10_00_000 ep to achieve performance of about 53% wins whereas
SarsaLam takes around 30_000 ep to achieve close to 51% and Func_App with SarsaLam takes
less than 100_000 ep to get accuracy close to 51%. In contrast, random agent gets an accuracy of about 45%.
Cons
It varies wildly depending on the initial initialization of weights
Q Would you expect bootstrapping to help more in blackjack or Easy21 ? Why?
Blackjack. Blackjack MDP is much simpler than Easy21 as it lesser number of states (keeping track of
states less than 11 is worthless as we cannot lose on hitting on those states) also from a given state
there are lesser number of transitions in blackjack (in blackjack we can only increase our state but in
Easy21 we can also decrease our state). Due to the simplicity of Blackjack, I think bootstrapping would be
more stable in blackjack.
Q What are the pros and cons of function approximation in Easy21?
Pros
Very Fast Learning(As it has a much lower number of features 36 as compared to 210)
Cons
Very sensitive to randomness of env as well as initialization
It learn different policy than SarsaLam and MC as it doesn't have fine control over states
Q How would you modify the function approximator suggested in this section to get better results in Easy21?
We can increase the states in coase coding that would help to represent more state
Using a method with decaying step size might improve the performance (when I tried this with hyperbolic decay
it just diverged wildly maybe we can use some other decay function)