Jason Yecheng Ma1, Jason Yan1, Dinesh Jayaraman1, Osbert Bastani1
1University of Pennsylvania
This is a PyTorch implementation of our paper How Far I'll Go: Offline Goal-Conditioned Reinforcement Learning via F-Advantage Regression; this code can be used to reproduce Section 5.1 and 5.2 of the paper.
Here is a teaser video comparing GoFAR against state-of-art offline GCRL algorithms on a real robot!
- MuJoCo=2.0.0
- Create conda environment and activate it:
conda env create -f environment.yml conda activate gofar pip install --upgrade numpy pip install torch==1.10.0 torchvision==0.11.1 torchaudio===0.10.0 gym==0.17.3
- (Optionally) install the Robel environment for the D'Claw experiment.
- Download the offline dataset here and place
in the project root directory.
We provide commands for reproducing the main GCRL results (Table 1), the ablations (Figure 3), and the stochastic offline GCRL experiment (Figure 4).
- The main results (Table 1) can be reproduced by the following command:
mpirun -np 1 python train.py --env $ENV --method $METHOD
Flags and Parameters | Description |
--env $ENV |
offline GCRL tasks: FetchReach, FetchPush, FetchPick, FetchSlide, HandReach, DClawTurn |
--method $METHOD |
offline GCRL algorithms: gofar, gcsl, wgcsl, actionablemodel, ddpg |
- To run the ablations (Figure 3), we can adjust some relevant command arguments. For example, to disable HER, we can do
mpirun -np 1 python train.py --env $ENV --method $METHOD --relabel False
Note that gofar
defaults to not using HER, so this command is only relevant to the baselines. Relevant flags are listed here:
Flags and Parameters | Description |
--relabel |
whether hindsight experience replay is enabled: True , False |
--relabel_percent |
The fraction of minibatch transitions that has relabeled goals: 0.0, 0.2, 0.5, 1.0 ; these are the hyperparameters attempted in the paper, you may try other fractions too. |
--f |
Choices of f-divergence for GoFAR: kl, chi . |
--reward_type |
Choices of reward function for GoFAR: disc, binary . |
- The following command will run the stochastic environment experiment (Figure 4):
mpirun -np 1 python train.py --env FetchReach --method $METHOD --noise True --noise-eps $NOISE_EPS
where $NOISE_EPS
can be chosen from 0.5, 1.0, 1.5
We borrowed some code from the following repositories: