Jason Yecheng Ma1, Jason Yan1, Dinesh Jayaraman1, Osbert Bastani1
1University of Pennsylvania
This is a PyTorch implementation of our paper How Far I'll Go: Offline Goal-Conditioned Reinforcement Learning via F-Advantage Regression; this code can be used to reproduce Section 5.1 and 5.2 of the paper.
Here is a teaser video comparing GoFAR against state-of-art offline GCRL algorithms on a real robot!
- MuJoCo=2.0.0
- Create conda environment and activate it:
conda env create -f environment.yml conda activate gofar pip install --upgrade numpy pip install torch==1.10.0 torchvision==0.11.1 torchaudio===0.10.0 gym==0.17.3
- (Optionally) install the Robel environment for the D'Claw experiment.
- Download the offline dataset here and place
/offline_data
in the project root directory.
We provide commands for reproducing the main GCRL results (Table 1), the ablations (Figure 3), and the stochastic offline GCRL experiment (Figure 4).
- The main results (Table 1) can be reproduced by the following command:
mpirun -np 1 python train.py --env $ENV --method $METHOD
Flags and Parameters | Description |
---|---|
--env $ENV |
offline GCRL tasks: FetchReach, FetchPush, FetchPick, FetchSlide, HandReach, DClawTurn |
--method $METHOD |
offline GCRL algorithms: gofar, gcsl, wgcsl, actionablemodel, ddpg |
- To run the ablations (Figure 3), we can adjust some relevant command arguments. For example, to disable HER, we can do
mpirun -np 1 python train.py --env $ENV --method $METHOD --relabel False
Note that gofar
defaults to not using HER, so this command is only relevant to the baselines. Relevant flags are listed here:
Flags and Parameters | Description |
---|---|
--relabel |
whether hindsight experience replay is enabled: True , False |
--relabel_percent |
The fraction of minibatch transitions that has relabeled goals: 0.0, 0.2, 0.5, 1.0 ; these are the hyperparameters attempted in the paper, you may try other fractions too. |
--f |
Choices of f-divergence for GoFAR: kl, chi . |
--reward_type |
Choices of reward function for GoFAR: disc, binary . |
- The following command will run the stochastic environment experiment (Figure 4):
mpirun -np 1 python train.py --env FetchReach --method $METHOD --noise True --noise-eps $NOISE_EPS
where $NOISE_EPS
can be chosen from 0.5, 1.0, 1.5
.
We borrowed some code from the following repositories: