Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change dist ckpt defaults (#10913) #11031

Merged
merged 2 commits into from
Oct 25, 2024
Merged

Commits on Oct 24, 2024

  1. Change dist ckpt defaults (#10913)

    * Enable ckpt features by default (async ckpt), ckpt every 15mins and reduce preemption time to 1min
    
    Signed-off-by: Shriya Palsamudram <[email protected]>
    
    * fix ssm tests
    
    Signed-off-by: Shriya Palsamudram <[email protected]>
    
    * Make note that ckpt_async_save is disabled for SSMs
    
    Signed-off-by: Shriya Palsamudram <[email protected]>
    
    * Enable async ckpt for SSMs with fix
    
    Signed-off-by: Shriya Palsamudram <[email protected]>
    
    * Disable async ckpt in the peft test as it is a known bug, add note.
    
    Signed-off-by: Shriya Palsamudram <[email protected]>
    
    * Fix failing unit tests
    
    Signed-off-by: Shriya Palsamudram <[email protected]>
    
    * Ashors/peft async ckpt (#11010)
    
    * [WIP] prototype for supporting async checkpointing with peft
    
    Signed-off-by: ashors1 <[email protected]>
    Signed-off-by: Shriya Palsamudram <[email protected]>
    
    * Enable async ckpt for the peft test
    
    Signed-off-by: Shriya Palsamudram <[email protected]>
    
    * Fix peft setup test
    
    Signed-off-by: Shriya Palsamudram <[email protected]>
    
    ---------
    
    Signed-off-by: Shriya Palsamudram <[email protected]>
    Signed-off-by: ashors1 <[email protected]>
    Co-authored-by: ataghibakhsh <[email protected]>
    ShriyaPalsamudram and JRD971000 committed Oct 24, 2024
    Configuration menu
    Copy the full SHA
    7b8d334 View commit details
    Browse the repository at this point in the history

Commits on Oct 25, 2024

  1. Configuration menu
    Copy the full SHA
    a37bcf2 View commit details
    Browse the repository at this point in the history