An Implementation of the paper PILCO: A Model-Based and Data-Efficient Approach to Policy Search by Mark Deisenroth and Carl Rasmussen
Eric Langlois
pip install -e .[tf_gpu]
# for CPU instead:
# pip install -e .[tf]
These take awhile.
python -m pytest
This only runs the faster tests (takes ~30s). To run all tests (20min) use
python -m pytest --run-slow
The code is automatically formatted with Black. This is enforced with pre-commit checks. Install them with
pip install pre-commit
pre-commit install
Cart-pole environment (defaults for most settings):
./scripts/run-pilco.py --log-level debug --gpu --visualize
Inverted pendulum with logging (in ~/data/pilco
by default, change with
--root-logdir
)
./scripts/run-pilco.py --gpu --visualize --log \
--env InvertedPendulumExtra-v2 \
--random-actions
Available environments:
ContinuousCartPole-v0
InvertedPendulumExtra-v2
InvertedDoublePendulumExtra-v2
SwimmerExtra-v2
Note: Training does not currently seem to work well on the MuJoCo environments
(all but ContinuousCartPole
).
See arguments descriptions:
./scripts/run-pilco.py --help
Use mbbl-run.py
to run on the Model-Based Baseline environments.
The script runs only on Gym environments that define the following environment keys:
reward.moment_map
: The reward function as a moment map.initial_state.mean
: The initial state mean vector.initial_state.covariance
: The initial state covariance matrix.
See pilco.rl.envs.gym
for examples.
This project is distributed under the MIT license (see the LICENSE
file).
There are additional copyright notices within code in the pilco/third_party
directory.
pip install pyqt5
- Sometimes the error is flaky and running again will succeed the next time.
- Try increasing
--min-noise
- Try setting
--random-actions
(might instead make it worse)