v0.4.7
API Change
- remove the requirements of sub fields (learn/collect/eval) in the policy config (users can define their own config formats)
- use
wandb
as the default logger in task pipeline - remove
value_network
config field and implementations in SAC and related algorithms
Env
- add dmc2gym env support and baseline (#451)
- update pettingzoo to the latest version (#597)
- polish icm/rnd+onppo config bugs and add app_door_to_key env (#564)
- add lunarlander continuous TD3/SAC config
- polish lunarlander discrete C51 config
Algorithm
- add Procedure Cloning (PC) imitation learning algorithm (#514)
- add Munchausen Reinforcement Learning (MDQN) algorithm (#590)
- add reward/value norm methods: popart & value rescale & symlog (#605)
- polish reward model config and training pipeline (#624)
- add PPOF reward space demo support (#608)
- add PPOF Atari demo support (#589)
- polish dqn default config and env examples (#611)
- polish comment and clean code about SAC
Enhancement
- add language model (e.g. GPT) training utils (#625)
- remove policy cfg sub fields requirements (#620)
- add full wandb support (#579)
Fix
- fix confusing shallow copy operation about next_obs (#641)
- fix unsqueeze action_args in PDQN when shape is 1 (#599)
- fix evaluator return_info tensor type bug (#592)
- fix deque buffer wrapper PER bug (#586)
- fix reward model save method compatibility bug
- fix logger assertion and unittest bug
- fix bfs test py3.9 compatibility bug
- fix zergling collector unittest bug
Style
- add DI-engine torch-rpc p2p communication docker (#628)
- add D4RL docker (#591)
- correct typo in task (#617)
- correct typo in time_helper (#602)
- polish readme and add treetensor example
- update contributing doc
New Plan
- Call for contributors about DI-engine (#621)
Full Changelog: v0.4.6...v0.4.7
Contributors: @PaParaZz1 @karroyan @zjowowen @ruoyuGao @kxzxvbk @nighood @song2181 @SolenoidWGT @PSHarold @jimmydengpeng @eltociear