Skip to content

Commit

Permalink
Make note that ckpt_async_save is disabled for SSMs
Browse files Browse the repository at this point in the history
Signed-off-by: Shriya Palsamudram <[email protected]>
  • Loading branch information
ShriyaPalsamudram committed Oct 18, 2024
1 parent 1ef88ca commit a0e91c8
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion nemo/lightning/pytorch/strategies/megatron_strategy.py
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,7 @@ class MegatronStrategy(DDPStrategy, io.IOMixin):
ckpt_assume_constant_structure (bool): Allows caching some computation across checkpoint saves.
Set to True only if the state dict structure doesn't change within a single job.
ckpt_parallel_save (bool): If true, each worker will write its own part of the dist checkpoint.
Defaults to True.
Defaults to True. Note that this is set to False for SSMs due to a known bug.
ckpt_parallel_save_within_dp (bool): If true, save will be parallelized only within a DP group
(whole world otherwise), which might slightly reduce the save overhead. Defaults to False.
ckpt_parallel_load (bool): If true, each worker will load part of the dist checkpoint
Expand Down

0 comments on commit a0e91c8

Please sign in to comment.