From 0b7726e0ced9e7caa7ecbbd0104789e33b0afae1 Mon Sep 17 00:00:00 2001 From: Antonin RAFFIN Date: Thu, 13 Jun 2019 11:02:40 +0200 Subject: [PATCH] Release 2.6.0 (#369) --- docs/misc/changelog.rst | 81 +++++++++++++++++++++++------------- setup.py | 2 +- stable_baselines/__init__.py | 2 +- 3 files changed, 54 insertions(+), 31 deletions(-) diff --git a/docs/misc/changelog.rst b/docs/misc/changelog.rst index dbb9c52b80..8d3e017be5 100644 --- a/docs/misc/changelog.rst +++ b/docs/misc/changelog.rst @@ -6,38 +6,16 @@ Changelog For download links, please look at `Github release page `_. -Pre-Release 2.6.0a1 (WIP) -------------------------- +Release 2.6.0 (2019-06-12) +-------------------------- **Hindsight Experience Replay (HER) - Reloaded | get/load parameters** -- revamped HER implementation: clean re-implementation from scratch, now supports DQN, SAC and DDPG -- **deprecated** ``memory_limit`` and ``memory_policy`` in DDPG, please use ``buffer_size`` instead. (will be removed in v3.x.x) +Breaking Changes: +^^^^^^^^^^^^^^^^^ + - **breaking change** removed ``stable_baselines.ddpg.memory`` in favor of ``stable_baselines.deepq.replay_buffer`` (see fix below) -- add ``action_noise`` param for SAC, it helps exploration for problem with deceptive reward -- removed unused dependencies (tdqm, dill, progressbar2, seaborn, glob2, click) -- Bugfix for ``VecEnvWrapper.__getattr__`` which enables access to class attributes inherited from parent classes. -- Removed ``get_available_gpus`` function which hadn't been used anywhere (@Pastafarianist) -- Fixed path splitting in ``TensorboardWriter._get_latest_run_id()`` on Windows machines (@PatrickWalter214) -- The parameter ``filter_size`` of the function ``conv`` in A2C utils now supports passing a list/tuple of two integers (height and width), in order to have non-squared kernel matrix. (@yutingsz) -- add ``random_exploration`` parameter for DDPG and SAC, it may be useful when using HER + DDPG/SAC - this hack was present in the original OpenAI Baselines DDPG + HER implementation. -- fixed a bug where initial learning rate is logged instead of its placeholder in ``A2C.setup_model`` (@sc420) -- fixed a bug where number of timesteps is incorrectly updated and logged in ``A2C.learn`` and ``A2C._train_step`` (@sc420) -- added ``load_parameters`` and ``get_parameters`` to base RL class. - With these methods, users are able to load and get parameters to/from existing model, without touching tensorflow. (@Miffyli) -- **important change** switched to using dictionaries rather than lists when storing parameters, with tensorflow Variable names being the keys. (@Miffyli) -- added specific hyperparameter for PPO2 to clip the value function (``cliprange_vf``) -- fixed ``num_timesteps`` (total_timesteps) variable in PPO2 that was wrongly computed. -- fixed a bug in DDPG/DQN/SAC, when there were the number of samples in the replay buffer was lesser than the batch size - (thanks to @dwiel for spotting the bug) -- **removed** ``a2c.utils.find_trainable_params`` please use ``common.tf_util.get_trainable_vars`` instead. - ``find_trainable_params`` was returning all trainable variables, discarding the scope argument. - This bug was causing the model to save duplicated parameters (for DDPG and SAC) - but did not affect the performance. -- added guide for managing ``NaN`` and ``inf`` -- added ``VecCheckNan`` wrapper -- updated ven_env doc + **Breaking Change:** DDPG replay buffer was unified with DQN/SAC replay buffer. As a result, when loading a DDPG model trained with stable_baselines<2.6.0, it throws an import error. @@ -59,6 +37,51 @@ You can fix that using: We recommend you to save again the model afterward, so the fix won't be needed the next time the trained agent is loaded. +New Features: +^^^^^^^^^^^^^ + +- **revamped HER implementation**: clean re-implementation from scratch, now supports DQN, SAC and DDPG +- add ``action_noise`` param for SAC, it helps exploration for problem with deceptive reward +- The parameter ``filter_size`` of the function ``conv`` in A2C utils now supports passing a list/tuple of two integers (height and width), in order to have non-squared kernel matrix. (@yutingsz) +- add ``random_exploration`` parameter for DDPG and SAC, it may be useful when using HER + DDPG/SAC. This hack was present in the original OpenAI Baselines DDPG + HER implementation. +- added ``load_parameters`` and ``get_parameters`` to base RL class. With these methods, users are able to load and get parameters to/from existing model, without touching tensorflow. (@Miffyli) +- added specific hyperparameter for PPO2 to clip the value function (``cliprange_vf``) +- added ``VecCheckNan`` wrapper + +Bug Fixes: +^^^^^^^^^^ + +- bugfix for ``VecEnvWrapper.__getattr__`` which enables access to class attributes inherited from parent classes. +- fixed path splitting in ``TensorboardWriter._get_latest_run_id()`` on Windows machines (@PatrickWalter214) +- fixed a bug where initial learning rate is logged instead of its placeholder in ``A2C.setup_model`` (@sc420) +- fixed a bug where number of timesteps is incorrectly updated and logged in ``A2C.learn`` and ``A2C._train_step`` (@sc420) +- fixed ``num_timesteps`` (total_timesteps) variable in PPO2 that was wrongly computed. +- fixed a bug in DDPG/DQN/SAC, when there were the number of samples in the replay buffer was lesser than the batch size + (thanks to @dwiel for spotting the bug) +- **removed** ``a2c.utils.find_trainable_params`` please use ``common.tf_util.get_trainable_vars`` instead. + ``find_trainable_params`` was returning all trainable variables, discarding the scope argument. + This bug was causing the model to save duplicated parameters (for DDPG and SAC) + but did not affect the performance. + +Deprecations: +^^^^^^^^^^^^^ + +- **deprecated** ``memory_limit`` and ``memory_policy`` in DDPG, please use ``buffer_size`` instead. (will be removed in v3.x.x) + +Others: +^^^^^^^ + +- **important change** switched to using dictionaries rather than lists when storing parameters, with tensorflow Variable names being the keys. (@Miffyli) +- removed unused dependencies (tdqm, dill, progressbar2, seaborn, glob2, click) +- removed ``get_available_gpus`` function which hadn't been used anywhere (@Pastafarianist) + +Documentation: +^^^^^^^^^^^^^^ + +- added guide for managing ``NaN`` and ``inf`` +- updated ven_env doc +- misc doc updates + Release 2.5.1 (2019-05-04) -------------------------- @@ -77,7 +100,7 @@ Release 2.5.1 (2019-05-04) - added ``get_attr()``, ``env_method()`` and ``set_attr()`` methods for all VecEnv. Those methods now all accept ``indices`` keyword to select a subset of envs. ``set_attr`` now returns ``None`` rather than a list of ``None``. (@kantneel) -- ``GAIL``: ``gail.dataset.ExpertDataset` supports loading from memory rather than file, and +- ``GAIL``: ``gail.dataset.ExpertDataset`` supports loading from memory rather than file, and ``gail.dataset.record_expert`` supports returning in-memory rather than saving to file. - added support in ``VecEnvWrapper`` for accessing attributes of arbitrarily deeply nested instances of ``VecEnvWrapper`` and ``VecEnv``. This is allowed as long as the attribute belongs diff --git a/setup.py b/setup.py index dea2d56728..8061b03422 100644 --- a/setup.py +++ b/setup.py @@ -137,7 +137,7 @@ license="MIT", long_description=long_description, long_description_content_type='text/markdown', - version="2.6.0a0", + version="2.6.0", ) # python setup.py sdist diff --git a/stable_baselines/__init__.py b/stable_baselines/__init__.py index 4f32d2ca7b..93c1193ab6 100644 --- a/stable_baselines/__init__.py +++ b/stable_baselines/__init__.py @@ -10,4 +10,4 @@ from stable_baselines.trpo_mpi import TRPO from stable_baselines.sac import SAC -__version__ = "2.6.0a0" +__version__ = "2.6.0"