[NeurIPS 2024] Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement Learning

Qi Wang* · Junming Yang* · Yunbo Wang · Xin Jin · Wenjun Zeng · Xiaokang Yang

Paper | arXiv | Website

Training offline RL models using visual inputs poses two significant challenges, i.e., the overfitting problem in representation learning and the overestimation bias for expected future rewards. Recent work has attempted to alleviate the overestimation bias by encouraging conservative behaviors. This paper, in contrast, tries to build more flexible constraints for value estimation without impeding the exploration of potential advantages. The key idea is to leverage off-the-shelf RL simulators, which can be easily interacted with in an online manner, as the “test bed” for offline policies. To enable effective online-to-offline knowledge transfer, we introduce CoWorld, a model-based RL approach that mitigates cross-domain discrepancies in state and reward spaces. Experimental results demonstrate the effectiveness of CoWorld, outperforming existing RL approaches by large margins.

Getting Strated

CoWorld is implemented and tested on Ubuntu 20.04 with python == 3.7, PyTorch == 1.13.1:

Create an environment

conda create -n coworld python=3.7
conda activate coworld

Install dependencies

pip install -r requirements.txt

Copy all files in ./modified_dmc_xml to the DMC directory in your conda environment, such as /home/.conda/envs/your_env_name/lib/python3.7/site-packages/dm_control/suite/.
Download the offline dataset here.

Meta-World/RoboDesk/DMC

Training command on Meta-World:

python3 co_training.py --source_task metaworld_drawer-close --target_task metaworld_door-close \
--offline_traindir 'offline_metaworld_data_path' \
--configs defaults metaworld

Training command on RoboDesk:

python3 co_training.py --source_task metaworld_button-press --target_task robodesk_push_green \
--offline_traindir 'offline_robodesk_data_path' \
--configs defaults robodesk

Training command on DMC:

python3 co_training.py --source_task dmc_walker_walk --target_task dmc_walker_run \
--offline_traindir 'offline_dmc_data_path' \
--configs defaults dmc

Citation

If you find this work useful in your research, please consider citing:

@inproceedings{wang2024making,
  title={Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement Learning}, 
  author={Qi Wang and Junming Yang and Yunbo Wang and Xin Jin and Wenjun Zeng and Xiaokang Yang},
  booktitle={NeurIPS},
  year={2024}
}

Acknowledgement

The codes refer to the implemention of dreamer-torch. Thanks for the authors！

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
assets		assets
modified_dmc_xml		modified_dmc_xml
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
co_configs.yaml		co_configs.yaml
co_training.py		co_training.py
exploration.py		exploration.py
models.py		models.py
networks.py		networks.py
requirements.txt		requirements.txt
tools.py		tools.py
wrappers.py		wrappers.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[NeurIPS 2024] Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement Learning

Paper | arXiv | Website

Getting Strated

Meta-World/RoboDesk/DMC

Citation

Acknowledgement

About

Releases

Packages

Languages

qiwang067/CoWorld

Folders and files

Latest commit

History

Repository files navigation

[NeurIPS 2024] Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement Learning

Paper | arXiv | Website

Getting Strated

Meta-World/RoboDesk/DMC

Citation

Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages