This repository has been archived by the owner on Jul 21, 2020. It is now read-only.

Files

week06_policy_based

[spring20] Miscellaneous fixes (yandexdataschool#430 )

May 17, 2020

4d5e6a6 · May 17, 2020

This branch is 112 commits behind yandexdataschool/Practical_RL:master.

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md	s/spring19/master/g	Jan 24, 2020
a2c-optional.ipynb		a2c-optional.ipynb	[Week 6 & 9] Fix typos in target value functions (yandexdataschool#405 )	May 1, 2020
atari_wrappers.py		atari_wrappers.py	Revert "Fix unrandomness in atari.py nature_dqn_env (all copies of en…	Apr 10, 2020
env_batch.py		env_batch.py	Manually fix autopep8 output	Jan 24, 2020
reinforce_lasagne.ipynb		reinforce_lasagne.ipynb	Miscellaneous fixes to spring20 (yandexdataschool#360 )	Apr 12, 2020
reinforce_pytorch.ipynb		reinforce_pytorch.ipynb	[spring20] Fixes (yandexdataschool#419 )	May 5, 2020
reinforce_tensorflow.ipynb		reinforce_tensorflow.ipynb	[spring20] Miscellaneous fixes (yandexdataschool#430 )	May 17, 2020
runners.py		runners.py	Apply autopep8 to all Python scripts	Jan 24, 2020

README.md

Materials

Slides
Video lecture by D. Silver - video
Our lecture, seminar(pytorch), seminar(theano)
Alternative lecture by J. Schulman part 1 - video
Alternative lecture by J. Schulman part 2 - video
Andrej Karpathy's post on policy gradients

More materials

Actually proving the policy gradient for discounted rewards - article
On variance of policy gradient and optimal baselines: article, another article
Learn Advatangeg Actor Critic with a comic
Generalizing log-derivative trick - url
Combining policy gradient and q-learning - arxiv
Variational perspective on reinforcement learning (from DeepBayes) - pdf
Adversarial review of policy gradient - blog

Run seminar notebook in colab:

Run optional homework notebook in colab: