Master's thesis: Using Models in Intrinsically Motivated Reinforcement Learning

This repositoy contains the code for the master's thesis on intrinsically motivated learning in robotics conducted at the Frankfurt Institute for Advanced Studies under supervision of Jochen Triesch and Charles Wilmot.

Summary

The goal of this thesis was to investigate how intrinsic motivation can be used to benefit the control of highly complex 7-DOF robot arms.

We first conducted a detailed analysis how reinforcement learnign agents (PPO) without any extrinsic rewards discover and manipulate their environment. We found that exploration which uses intrinsic motivation computed from multiple modalities (proprioception and touch) is much more efficient than using either proprioception or touch in isolation. We thus advocate that all possible sensor streams should be factored in when trying to model human-like exploration schema.

The second part of the thesis develops a novel reinforcement learning algorithm that uses a learned inverse model of the environment to reach goals in sparse reward settings. We find that this approach is order of magnitude more effective than using random exploration to reach goals. Furthermore, our approach is suited for on-policy learning methods and fulfills a similar role as hindsight experience replay (HER) does in off-policy settings. Our approach uses a mixture policy which consists of a linear interpolation of a standard PPO policy and a deep inverse model which is conditioned on goals. We use a mixing rate $$\alpha$$. Note that $$\alpha = 0$$ reduces to the baseline RL setting with only a PPO policy:

We also show that when learning the inverse model from data which was generated by intrinsically motivated agents, we can reach goals even faster and more efficient. Note that especially in settings where goals are harder to reach (further form the starting point), intrinsic motivation makes the biggest impact on performance:

How to reproduce the results

TODO

Name		Name	Last commit message	Last commit date
Latest commit History 431 Commits
algo		algo
checkpoints		checkpoints
conf		conf
experiments		experiments
prod		prod
results		results
scenes		scenes
test		test
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
_utils.py		_utils.py
agent.py		agent.py
clean_ds.py		clean_ds.py
dataset		dataset
ds_diversity.py		ds_diversity.py
easy_goals.p		easy_goals.p
environment.py		environment.py
evaluate.py		evaluate.py
gen_datasets.sh		gen_datasets.sh
generate_pics.py		generate_pics.py
hard_goals.p		hard_goals.p
inverse_model.py		inverse_model.py
merge_data.py		merge_data.py
mp_runner.py		mp_runner.py
observation.py		observation.py
plots.ipynb		plots.ipynb
requirements.txt		requirements.txt
run.py		run.py
run.sh		run.sh
run_dp.py		run_dp.py
test.py		test.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Master's thesis: Using Models in Intrinsically Motivated Reinforcement Learning

Summary

How to reproduce the results

About

Releases

Packages

Languages

timtody/intrinsically_motivated_robotics

Folders and files

Latest commit

History

Repository files navigation

Master's thesis: Using Models in Intrinsically Motivated Reinforcement Learning

Summary

How to reproduce the results

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages