Skip to content

Commit

Permalink
add documentation for reset_lr feature (#9639)
Browse files Browse the repository at this point in the history
* Change default parallel_save to False (#9633)

Signed-off-by: Mikołaj Błaż <[email protected]>

* Unwrap ckpt_io for model opt (async save) (#9622) (#9634)

Signed-off-by: Mikołaj Błaż <[email protected]>

* add reset_lr documentation

Signed-off-by: dimapihtar <[email protected]>

* fix style

Signed-off-by: dimapihtar <[email protected]>

* fix style

Signed-off-by: dimapihtar <[email protected]>

* fix style

Signed-off-by: dimapihtar <[email protected]>

* add image

Signed-off-by: dimapihtar <[email protected]>

* fix typo

Signed-off-by: dimapihtar <[email protected]>

* fix plot

Signed-off-by: dimapihtar <[email protected]>

* fix plot

Signed-off-by: dimapihtar <[email protected]>

* change plot size

Signed-off-by: dimapihtar <[email protected]>

* fix style

Signed-off-by: dimapihtar <[email protected]>

* move image

Signed-off-by: dimapihtar <[email protected]>

* add reset_lr to intro page

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
  • Loading branch information
dimapihtar and mikolajblaz committed Aug 27, 2024
1 parent a922472 commit 6439673
Show file tree
Hide file tree
Showing 2 changed files with 32 additions and 1 deletion.
3 changes: 2 additions & 1 deletion docs/source/nlp/nemo_megatron/intro.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ To learn more about using NeMo to train Large Language Models at scale, please r
peft/landing_page
positional_embeddings
mcore_customization
reset_learning_rate


References
Expand All @@ -28,4 +29,4 @@ References
.. bibliography:: ../nlp_all.bib
:style: plain
:labelprefix: nlp-megatron
:keyprefix: nlp-megatron-
:keyprefix: nlp-megatron-
30 changes: 30 additions & 0 deletions docs/source/nlp/nemo_megatron/reset_learning_rate.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
.. _reset_learning_rate:

Reset Learning Rate
-------------------

The reset learning rate feature provides the ability to reset the learning rate for an existing checkpoint to its initial value (either 0 or ``optim.min_lr`` depending on the warmup steps) when performing continual pretraining.

Parameters
----------

* ``reset_lr`` (boolean): Enables resetting the learning rate to the initial value. This feature is only supported with the distributed optimizer and megatron_amp_O2.
* ``reset_lr_steps`` (boolean): Enables adjusting the learning rate's max_steps and decay_steps by subtracting the number of steps already completed at the checkpoint.

Use Cases
---------

1. ``reset_lr=True, reset_lr_steps=False``
When pretraining an existing checkpoint "from scratch" on a different dataset. The learning rate will be reset to its initial value. This allows the model to start training on a new dataset with the same learning rate dynamics as if it were starting from scratch.

2. ``reset_lr=True, reset_lr_steps=True``
When continuing training from an existing checkpoint with the same configuration. The learning rate will be reset to its initial value, and the ``max_steps`` and ``decay_steps`` for learning rate schedule will be recalculated by subtracting the number of steps already completed at the checkpoint. Specifically:
* ``max_steps`` will be recalculated as ``max_steps -= completed_steps``.
* ``decay_steps`` will be recalculated as ``decay_steps -= completed_steps``.
This ensures that the learning rate reaches the ``min_lr`` value by the end of training without changing the ``trainer.max_steps``:

.. image:: https://github.com/NVIDIA/NeMo/releases/download/v2.0.0rc0/asset-post-reset-learning-rate-example.png
:alt:
:width: 1080px


0 comments on commit 6439673

Please sign in to comment.