add documentation for reset_lr feature (NVIDIA#9639) (NVIDIA#10290)

* Change default parallel_save to False (NVIDIA#9633) * Unwrap ckpt_io for model opt (async save) (NVIDIA#9622) (NVIDIA#9634) * add reset_lr documentation * fix style * fix style * fix style * add image * fix typo * fix plot * fix plot * change plot size * fix style * move image * add reset_lr to intro page --------- Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Signed-off-by: Lifu Zhang <[email protected]>
tomlifu · Oct 25, 2024 · d29e33b · d29e33b
1 parent b865cfa
commit d29e33b
Show file tree

Hide file tree

Showing 2 changed files with 31 additions and 0 deletions.
diff --git a/docs/source/nlp/nemo_megatron/intro.rst b/docs/source/nlp/nemo_megatron/intro.rst
@@ -20,6 +20,7 @@ To learn more about using NeMo to train Large Language Models at scale, please r
    peft/landing_page
    positional_embeddings
    mcore_customization
+   reset_learning_rate
    rampup_batch_size
 
 

diff --git a/docs/source/nlp/nemo_megatron/reset_learning_rate.rst b/docs/source/nlp/nemo_megatron/reset_learning_rate.rst
@@ -0,0 +1,30 @@
+.. _reset_learning_rate:
+
+Reset Learning Rate
+-------------------
+
+The reset learning rate feature provides the ability to reset the learning rate for an existing checkpoint to its initial value (either 0 or ``optim.min_lr`` depending on the warmup steps) when performing continual pretraining.
+
+Parameters
+----------
+
+* ``reset_lr`` (boolean): Enables resetting the learning rate to the initial value. This feature is only supported with the distributed optimizer and megatron_amp_O2.
+* ``reset_lr_steps`` (boolean): Enables adjusting the learning rate's max_steps and decay_steps by subtracting the number of steps already completed at the checkpoint.
+
+Use Cases
+---------
+
+1. ``reset_lr=True, reset_lr_steps=False``
+When pretraining an existing checkpoint "from scratch" on a different dataset. The learning rate will be reset to its initial value. This allows the model to start training on a new dataset with the same learning rate dynamics as if it were starting from scratch.
+
+2. ``reset_lr=True, reset_lr_steps=True``
+When continuing training from an existing checkpoint with the same configuration. The learning rate will be reset to its initial value, and the ``max_steps`` and ``decay_steps`` for learning rate schedule will be recalculated by subtracting the number of steps already completed at the checkpoint. Specifically:
+    * ``max_steps`` will be recalculated as ``max_steps -= completed_steps``.
+    * ``decay_steps`` will be recalculated as ``decay_steps -= completed_steps``.
+This ensures that the learning rate reaches the ``min_lr`` value by the end of training without changing the ``trainer.max_steps``:
+
+.. image:: https://github.com/NVIDIA/NeMo/releases/download/v2.0.0rc0/asset-post-reset-learning-rate-example.png
+  :alt: 
+  :width: 1080px
+
+