From dbca48bf7ba343950c7b82cb0f1e0335a5bf4c96 Mon Sep 17 00:00:00 2001 From: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Date: Thu, 29 Aug 2024 23:09:45 +0300 Subject: [PATCH] add documentation for reset_lr feature (#9639) (#10290) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Change default parallel_save to False (#9633) * Unwrap ckpt_io for model opt (async save) (#9622) (#9634) * add reset_lr documentation * fix style * fix style * fix style * add image * fix typo * fix plot * fix plot * change plot size * fix style * move image * add reset_lr to intro page --------- Signed-off-by: Mikołaj Błaż Signed-off-by: dimapihtar Signed-off-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Co-authored-by: mikolajblaz --- docs/source/nlp/nemo_megatron/intro.rst | 1 + .../nlp/nemo_megatron/reset_learning_rate.rst | 30 +++++++++++++++++++ 2 files changed, 31 insertions(+) create mode 100644 docs/source/nlp/nemo_megatron/reset_learning_rate.rst diff --git a/docs/source/nlp/nemo_megatron/intro.rst b/docs/source/nlp/nemo_megatron/intro.rst index 65aaee2add6aa..831edc4bbd420 100644 --- a/docs/source/nlp/nemo_megatron/intro.rst +++ b/docs/source/nlp/nemo_megatron/intro.rst @@ -20,6 +20,7 @@ To learn more about using NeMo to train Large Language Models at scale, please r peft/landing_page positional_embeddings mcore_customization + reset_learning_rate rampup_batch_size diff --git a/docs/source/nlp/nemo_megatron/reset_learning_rate.rst b/docs/source/nlp/nemo_megatron/reset_learning_rate.rst new file mode 100644 index 0000000000000..f89daeeb3907c --- /dev/null +++ b/docs/source/nlp/nemo_megatron/reset_learning_rate.rst @@ -0,0 +1,30 @@ +.. _reset_learning_rate: + +Reset Learning Rate +------------------- + +The reset learning rate feature provides the ability to reset the learning rate for an existing checkpoint to its initial value (either 0 or ``optim.min_lr`` depending on the warmup steps) when performing continual pretraining. + +Parameters +---------- + +* ``reset_lr`` (boolean): Enables resetting the learning rate to the initial value. This feature is only supported with the distributed optimizer and megatron_amp_O2. +* ``reset_lr_steps`` (boolean): Enables adjusting the learning rate's max_steps and decay_steps by subtracting the number of steps already completed at the checkpoint. + +Use Cases +--------- + +1. ``reset_lr=True, reset_lr_steps=False`` +When pretraining an existing checkpoint "from scratch" on a different dataset. The learning rate will be reset to its initial value. This allows the model to start training on a new dataset with the same learning rate dynamics as if it were starting from scratch. + +2. ``reset_lr=True, reset_lr_steps=True`` +When continuing training from an existing checkpoint with the same configuration. The learning rate will be reset to its initial value, and the ``max_steps`` and ``decay_steps`` for learning rate schedule will be recalculated by subtracting the number of steps already completed at the checkpoint. Specifically: + * ``max_steps`` will be recalculated as ``max_steps -= completed_steps``. + * ``decay_steps`` will be recalculated as ``decay_steps -= completed_steps``. +This ensures that the learning rate reaches the ``min_lr`` value by the end of training without changing the ``trainer.max_steps``: + +.. image:: https://github.com/NVIDIA/NeMo/releases/download/v2.0.0rc0/asset-post-reset-learning-rate-example.png + :alt: + :width: 1080px + +