Easy21

This contains solution to assignment Easy21, in short Reinforcement Learning by David Silver https://www.davidsilver.uk/teaching/

How you can use this to generate your own results?
You can simply run all the cells of notebook in jupyter notebook. If you don't have jupyter notebook, I would recommend you to install it and open it there otherwise you can install jupyterplugin in VScode and run these. Just make sure you have the following packages installed.

1 Numpy
2 Matplotlib
3 tqdm

Results:-

Monte Carlo Control

SarsaLam

Linear Function Approximator

Answer to Question 5

Q What are the pros and cons of bootstrapping in Easy21?

Pros  
Much faster to learn (MC takes 10_00_000 ep to achieve performance of about 53% wins whereas
SarsaLam takes around 30_000 ep to achieve close to 51% and Func_App with SarsaLam takes
less than 100_000 ep to get accuracy close to 51%. In contrast, random agent gets an accuracy of about 45%.

Cons  
It varies wildly depending on the initial initialization of weights

Q Would you expect bootstrapping to help more in blackjack or Easy21 ? Why?

Blackjack. Blackjack MDP is much simpler than Easy21 as it lesser number of states (keeping track of
states less than 11 is worthless as we cannot lose on hitting on those states) also from a given state
there are lesser number of transitions in blackjack (in blackjack we can only increase our state but in 
Easy21 we can also decrease our state). Due to the simplicity of Blackjack, I think bootstrapping would be
more stable in blackjack.

Q What are the pros and cons of function approximation in Easy21?

Pros
Very Fast Learning(As it has a much lower number of features 36 as compared to 210)

Cons
Very sensitive to randomness of env as well as initialization
It learn different policy than SarsaLam and MC as it doesn't have fine control over states

Q How would you modify the function approximator suggested in this section to get better results in Easy21?

We can increase the states in coase coding that would help to represent more state

Using a method with decaying step size might improve the performance (when I tried this with hyperbolic decay
it just diverged wildly maybe we can use some other decay function)

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.ipynb_checkpoints		.ipynb_checkpoints
LinApprox 2024-08-05 20:51:22.041715		LinApprox 2024-08-05 20:51:22.041715
MC_Results 2024-07-21 19:26:46.469298		MC_Results 2024-07-21 19:26:46.469298
SarsaLam 2024-08-05 20:51:25.945715		SarsaLam 2024-08-05 20:51:25.945715
__pycache__		__pycache__
Easy21-Lin_Approx.ipynb		Easy21-Lin_Approx.ipynb
Easy21-MC.ipynb		Easy21-MC.ipynb
Easy21-SarsaLam.ipynb		Easy21-SarsaLam.ipynb
README.md		README.md
env.py		env.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Easy21

This contains solution to assignment Easy21, in short Reinforcement Learning by David Silver https://www.davidsilver.uk/teaching/

Results:-

Monte Carlo Control

SarsaLam

Linear Function Approximator

Answer to Question 5

About

Releases

Packages

Languages

Om2005Prakash/Easy21

Folders and files

Latest commit

History

Repository files navigation

Easy21

This contains solution to assignment Easy21, in short Reinforcement Learning by David Silver https://www.davidsilver.uk/teaching/

Results:-

Monte Carlo Control

SarsaLam

Linear Function Approximator

Answer to Question 5

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages