Name		Name	Last commit message	Last commit date
parent directory ..
README.rst		README.rst
__init__.py		__init__.py
hyperparams.yml		hyperparams.yml
model_setup.py		model_setup.py
util.py		util.py

README.rst

Vanilla Policy Gradient Agent

This tutorial contains a simple example of how to build a policy-gradient based agent that can solve the CartPole problem.

The agent is capable of taking in an observation of the world, then taking actions which provide the optimal reward not just in the present, but over the long run.

To take reward over time into account, we need to update our agent with more than one experience at a time. To accomplish this, we collect experiences in a buffer, and then occasionally use them to update the agent all at once. These sequences of experience are sometimes referred to as rollouts, or experience traces.

Rewards are discounted over time. We use this modified reward as an estimation of the advantage in our loss equation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

policy_gradient

policy_gradient

README.rst

Vanilla Policy Gradient Agent

Files

policy_gradient

Directory actions

More options

Directory actions

More options

Latest commit

History

policy_gradient

Folders and files

parent directory

README.rst

Vanilla Policy Gradient Agent