Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restart and bestval checkpointing #4

Closed
azton opened this issue Mar 12, 2023 · 0 comments
Closed

Restart and bestval checkpointing #4

azton opened this issue Mar 12, 2023 · 0 comments

Comments

@azton
Copy link

azton commented Mar 12, 2023

Need two different versions of checkpointing; restart checkpoints will be sharded with the intent of reloading into a deepspeed trainer and need to include optimizer states. Best Val checkpoints are saved as single models (no sharding), do not require optimizer states. These checkpoints are intended as the 'inference pipeline' checkpoints.

@azton azton closed this as completed Mar 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant