Release Refactored Stable Baselines · hill-a/stable-baselines

refactored A2C, ACER, ACTKR, DDPG, DeepQ, GAIL, TRPO, PPO1 and PPO2 under a single constant class
added callback to refactored algorithm training
added saving and loading to refactored algorithms
refactored ACER, DDPG, GAIL, PPO1 and TRPO to fit with A2C, PPO2 and ACKTR policies
added new policies for most algorithms (Mlp, MlpLstm, MlpLnLstm, Cnn, CnnLstm and CnnLnLstm)
added dynamic environment switching (so continual RL learning is now feasible)
added prediction from observation and action probability from observation for all the algorithms
fixed graphs issues, so models wont collide in names
fixed behavior_clone weight loading for GAIL
fixed Tensorflow using all the GPU VRAM
fixed models so that they are all compatible with vectorized environments
fixed set_global_seed to update gym.spaces's random seed
fixed PPO1 and TRPO performance issues when learning identity function
added new tests for loading, saving, continuous actions and learning the identity function
fixed DQN wrapping for atari
added saving and loading for Vecnormalize wrapper
added automatic detection of action space (for the policy network)
fixed ACER buffer with constant values assuming n_stack=4
fixed some RL algorithms not clipping the action to be in the action_space, when using gym.spaces.Box
refactored algorithms can take either a gym.Environment or a str (if the environment name is registered)
Hoftix in ACER (compared to v1.0.0)

Future Work :

Provide feedback