v0.5.1
Env
- add MADDPG pettingzoo example (#774)
- polish NGU Atari configs (#767)
- fix bug in cliffwalking env (#759)
- add PettingZoo replay video demo
- change default max retry in env manager from 5 to 1
Algorithm
- add QGPO diffusion-model related algorithm (#757)
- add HAPPO multi-agent algorithm (#717)
- add DreamerV3 + MiniGrid adaption (#725)
- fix hppo entropy_weight to avoid nan error in log_prob (#761)
- fix structured action bug (#760)
- polish Decision Transformer entry (#754)
- fix EDAC policy/model bug
Fix
- fix env typos
- fix pynng requirements bug
- fix communication module unittest bug
Style
- polish policy API doc (#762) (#764) (#768)
- add agent API doc (#758)
- polish torch_utils/utils API doc (#745) (#747) (#752) (#755) (#763)
News
- AAAI 2024: SO2: A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning
- LMDrive: Closed-Loop End-to-End Driving with Large Language Models
Full Changelog: v0.5.0...v0.5.1
Contributors: @PaParaZz1 @zjowowen @nighood @kxzxvbk @puyuan1996 @Cloud-Pku @AltmanD @HarryXuancy