-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add StragglerDetection and FTlauncher to NeMo2.0 #11117
Commits on Nov 15, 2024
-
Signed-off-by: Shriya Palsamudram <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 0e8dd86 - Browse repository at this point
Copy the full SHA 0e8dd86View commit details -
Signed-off-by: Shriya Palsamudram <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 58b1d42 - Browse repository at this point
Copy the full SHA 58b1d42View commit details -
Apply isort and black reformatting
Signed-off-by: ShriyaPalsamudram <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for e350f03 - Browse repository at this point
Copy the full SHA e350f03View commit details -
Add StragglerDetection callback to all NeMo2.0 recipes
Signed-off-by: Shriya Palsamudram <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 3fd001b - Browse repository at this point
Copy the full SHA 3fd001bView commit details -
Apply isort and black reformatting
Signed-off-by: ShriyaPalsamudram <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for f0ea714 - Browse repository at this point
Copy the full SHA f0ea714View commit details -
Add missing and remove unsued imports
Signed-off-by: Shriya Palsamudram <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for a637494 - Browse repository at this point
Copy the full SHA a637494View commit details -
Apply isort and black reformatting
Signed-off-by: ShriyaPalsamudram <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 849872d - Browse repository at this point
Copy the full SHA 849872dView commit details -
Signed-off-by: Shriya Palsamudram <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 1af154f - Browse repository at this point
Copy the full SHA 1af154fView commit details -
Signed-off-by: Shriya Palsamudram <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 874a879 - Browse repository at this point
Copy the full SHA 874a879View commit details -
Apply isort and black reformatting
Signed-off-by: ShriyaPalsamudram <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 8f4a22b - Browse repository at this point
Copy the full SHA 8f4a22bView commit details -
Signed-off-by: Shriya Palsamudram <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 6c5df44 - Browse repository at this point
Copy the full SHA 6c5df44View commit details -
Apply isort and black reformatting
Signed-off-by: ShriyaPalsamudram <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 4f13a42 - Browse repository at this point
Copy the full SHA 4f13a42View commit details -
add ft launcher using nemo-run for llama3 test
Signed-off-by: Shriya Palsamudram <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 6f80cce - Browse repository at this point
Copy the full SHA 6f80cceView commit details -
Apply isort and black reformatting
Signed-off-by: ShriyaPalsamudram <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for e3d29e9 - Browse repository at this point
Copy the full SHA e3d29e9View commit details -
Signed-off-by: Shriya Palsamudram <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 140ebbf - Browse repository at this point
Copy the full SHA 140ebbfView commit details -
Apply isort and black reformatting
Signed-off-by: ShriyaPalsamudram <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 1f99d40 - Browse repository at this point
Copy the full SHA 1f99d40View commit details -
Signed-off-by: Shriya Palsamudram <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 160def9 - Browse repository at this point
Copy the full SHA 160def9View commit details -
Apply isort and black reformatting
Signed-off-by: ShriyaPalsamudram <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for f79d8a6 - Browse repository at this point
Copy the full SHA f79d8a6View commit details -
Signed-off-by: Shriya Palsamudram <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for d0aa5d9 - Browse repository at this point
Copy the full SHA d0aa5d9View commit details -
Signed-off-by: Shriya Balaji Palsamudram <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 8508edb - Browse repository at this point
Copy the full SHA 8508edbView commit details -
Apply isort and black reformatting
Signed-off-by: ShriyaPalsamudram <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 16cebf3 - Browse repository at this point
Copy the full SHA 16cebf3View commit details -
Simulate a crash using step, disable checkpointing
Signed-off-by: Shriya Palsamudram <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 42ca51b - Browse repository at this point
Copy the full SHA 42ca51bView commit details -
Apply isort and black reformatting
Signed-off-by: ShriyaPalsamudram <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for dc11d21 - Browse repository at this point
Copy the full SHA dc11d21View commit details -
Add a straggler detection test as well
Signed-off-by: Shriya Palsamudram <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 9519c3c - Browse repository at this point
Copy the full SHA 9519c3cView commit details -
Apply isort and black reformatting
Signed-off-by: ShriyaPalsamudram <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 8829909 - Browse repository at this point
Copy the full SHA 8829909View commit details -
Apply isort and black reformatting
Signed-off-by: ShriyaPalsamudram <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 00aa2e1 - Browse repository at this point
Copy the full SHA 00aa2e1View commit details -
Revert enabling straggler_detection by default in all recipes
Signed-off-by: Shriya Palsamudram <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 0aaa9e0 - Browse repository at this point
Copy the full SHA 0aaa9e0View commit details -
Signed-off-by: Shriya Palsamudram <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 3fdcdce - Browse repository at this point
Copy the full SHA 3fdcdceView commit details -
Remove extra check in ConfigValidationPlugin
Signed-off-by: Shriya Palsamudram <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for b26bd70 - Browse repository at this point
Copy the full SHA b26bd70View commit details -
Signed-off-by: Shriya Palsamudram <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 98a533f - Browse repository at this point
Copy the full SHA 98a533fView commit details -
Improve straggler detection testing and add doc string
Signed-off-by: Shriya Palsamudram <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 166048f - Browse repository at this point
Copy the full SHA 166048fView commit details -
Apply isort and black reformatting
Signed-off-by: ShriyaPalsamudram <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 3d0a03d - Browse repository at this point
Copy the full SHA 3d0a03dView commit details -
Signed-off-by: Shriya Palsamudram <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for bff35c4 - Browse repository at this point
Copy the full SHA bff35c4View commit details -
Signed-off-by: Shriya Palsamudram <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for e6bbb27 - Browse repository at this point
Copy the full SHA e6bbb27View commit details -
Apply isort and black reformatting
Signed-off-by: ShriyaPalsamudram <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for ba065bd - Browse repository at this point
Copy the full SHA ba065bdView commit details -
Append run logs to a file after a crash
Signed-off-by: Shriya Palsamudram <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 7f27da9 - Browse repository at this point
Copy the full SHA 7f27da9View commit details
Commits on Nov 18, 2024
-
Set FAULT_TOL_FINISHED_FLAG_FILE and FAULT_TOL_CFG_PATH
Signed-off-by: Shriya Palsamudram <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 6c0857f - Browse repository at this point
Copy the full SHA 6c0857fView commit details
Commits on Nov 19, 2024
-
Merge branch 'main' into shriya/resiliency
Signed-off-by: Shriya Rishab <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for fd55915 - Browse repository at this point
Copy the full SHA fd55915View commit details