PPO With Stein Control Variate

In this work, we propose a control variate method to effectively reduce variance for policy gradient methods motivated by Stein's identity.

This repository contains the code of the Proximal Policy Optimization(PPO) with Stein control variates for Mujoco environments.

The code is based on the excellent implementation of PPO.

Dependencies

Python 3.5
MuJoCo
TensorFlow 1.3
Gym - Installation instructions.

Running Experiments

You can run following commands to reproduce our results:

cd optimization

# For MinVar optimization
python train.py HalfCheetah-v1 -b 10000 -ps large -po MinVar -p 500 
python train.py Walker2d-v1 -b 10000 -ps large -po MinVar -p 500 
python train.py Hopper-v1 -b 10000 -ps large -po MinVar -p 500 
 
python train.py Ant-v1 -b 10000 -ps small -po MinVar -p 500 
python train.py Humanoid-v1 -b 10000 -ps small -po MinVar -p 500 
python train.py HumanoidStandup-v1 -b 10000 -ps small -po MinVar -p 500 


# For FitQ optimization
python train.py HalfCheetah-v1 -b 10000 -ps large -po FitQ -p 500 
python train.py Walker2d-v1 -b 10000 -ps large -po FitQ -p 500 
python train.py Hopper-v1 -b 10000 -ps large -po FitQ -p 500 

python train.py Ant-v1 -b 10000 -ps small -po FitQ -p 500 
python train.py Humanoid-v1 -b 10000 -ps small -po FitQ -p 500 
python train.py HumanoidStandup-v1 -b 10000 -ps small -po FitQ -p 500


#For baseline PPO
python train.py HalfCheetah-v1 -b 10000 -ps large -c 0
python train.py Walker2d-v1 -b 10000 -ps large -c 0
python train.py Hopper-v1 -b 10000 -ps large -c 0

python train.py Ant-v1 -b 10000 -ps small -c 0
python train.py Humanoid-v1 -b 10000 -ps small -c 0
python train.py HumanoidStandup-v1 -b 10000 -ps small -c 0

The log files is in optimization/dartml_data. Further, we provide two shell scripts for tuning hyperparameters of stein control variates in the scripts folder.

For evaluation of PPO with/without Stein control variate, please see here.

Citations

If you find Stein control variates helpful, please cite following papers:

Sample-efficient Policy Optimization with Stein Control Variate. Hao Liu*, Yihao Feng*, Yi Mao, Dengyong Zhou, Jian Peng, Qiang Liu (*: equal contribution). Preprint 2017

Feedbacks

If you have any questions about the code or the paper, please feel free to contact us.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
evaluation		evaluation
optimization		optimization
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PPO With Stein Control Variate

Dependencies

Running Experiments

Citations

Feedbacks

About

Releases

Packages

Contributors 2

Languages

License

DartML/PPO-Stein-Control-Variate

Folders and files

Latest commit

History

Repository files navigation

PPO With Stein Control Variate

Dependencies

Running Experiments

Citations

Feedbacks

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages