-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance fine-tuning recipes for llama3 8b + 70b #11046
Conversation
Signed-off-by: Valerie Sarge <[email protected]>
a38d021
to
dbc6e6e
Compare
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: vysarge <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few small questions
@@ -273,7 +285,82 @@ def finetune_recipe( | |||
recipe.optim.config.lr = 5e-6 | |||
elif peft_scheme.lower() == 'lora': | |||
recipe.peft = run.Config(LoRA) | |||
recipe.peft.dim = 8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have you verified that dim=8 would not compromise accuracy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rank 8 follows the default from torchtune here; in tests with 8b and 70b loss looks to decrease smoothly across 50-100 train steps.
Use this method with caution and only when you need maximum performance. | ||
It may not be suitable for all hardware configurations or use cases. | ||
""" | ||
recipe.trainer.strategy.tensor_model_parallel_size = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just checking, TP1 works for full finetuning as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With BF16 grad it should. Let me double-check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, both fit.
|
||
# Sequence length settings in the model and dataset must agree | ||
recipe.model.config.seq_length = seq_length | ||
recipe.data.seq_length = seq_length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
without packed sequence, we usually use seq_length=2048 (because most samples in Squad are well shorter than 2048 tokens)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just altered this to use 4K for packed and 2K for unpacked by default.
Signed-off-by: Valerie Sarge <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: Valerie Sarge <[email protected]>
* llama3 finetuning perf recipes progress capture Signed-off-by: Valerie Sarge <[email protected]> * Small syntax fix Signed-off-by: Valerie Sarge <[email protected]> * syntax Signed-off-by: Valerie Sarge <[email protected]> * Apply isort and black reformatting Signed-off-by: vysarge <[email protected]> * Correct ddp setting Signed-off-by: Valerie Sarge <[email protected]> * Fix hasattr check Signed-off-by: Valerie Sarge <[email protected]> * bf16 grad Signed-off-by: Valerie Sarge <[email protected]> * Update configs for 8b + 70b Signed-off-by: Valerie Sarge <[email protected]> * Set wgrad_deferral_limit Signed-off-by: Valerie Sarge <[email protected]> --------- Signed-off-by: Valerie Sarge <[email protected]> Signed-off-by: vysarge <[email protected]> Co-authored-by: vysarge <[email protected]>
* llama3 finetuning perf recipes progress capture * Small syntax fix * syntax * Apply isort and black reformatting * Correct ddp setting * Fix hasattr check * bf16 grad * Update configs for 8b + 70b * Set wgrad_deferral_limit --------- Signed-off-by: Valerie Sarge <[email protected]> Signed-off-by: vysarge <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: vysarge <[email protected]>
* llama3 finetuning perf recipes progress capture Signed-off-by: Valerie Sarge <[email protected]> * Small syntax fix Signed-off-by: Valerie Sarge <[email protected]> * syntax Signed-off-by: Valerie Sarge <[email protected]> * Apply isort and black reformatting Signed-off-by: vysarge <[email protected]> * Correct ddp setting Signed-off-by: Valerie Sarge <[email protected]> * Fix hasattr check Signed-off-by: Valerie Sarge <[email protected]> * bf16 grad Signed-off-by: Valerie Sarge <[email protected]> * Update configs for 8b + 70b Signed-off-by: Valerie Sarge <[email protected]> * Set wgrad_deferral_limit Signed-off-by: Valerie Sarge <[email protected]> --------- Signed-off-by: Valerie Sarge <[email protected]> Signed-off-by: vysarge <[email protected]> Co-authored-by: vysarge <[email protected]> Signed-off-by: Hainan Xu <[email protected]>
* llama3 finetuning perf recipes progress capture Signed-off-by: Valerie Sarge <[email protected]> * Small syntax fix Signed-off-by: Valerie Sarge <[email protected]> * syntax Signed-off-by: Valerie Sarge <[email protected]> * Apply isort and black reformatting Signed-off-by: vysarge <[email protected]> * Correct ddp setting Signed-off-by: Valerie Sarge <[email protected]> * Fix hasattr check Signed-off-by: Valerie Sarge <[email protected]> * bf16 grad Signed-off-by: Valerie Sarge <[email protected]> * Update configs for 8b + 70b Signed-off-by: Valerie Sarge <[email protected]> * Set wgrad_deferral_limit Signed-off-by: Valerie Sarge <[email protected]> --------- Signed-off-by: Valerie Sarge <[email protected]> Signed-off-by: vysarge <[email protected]> Co-authored-by: vysarge <[email protected]>
* llama3 finetuning perf recipes progress capture Signed-off-by: Valerie Sarge <[email protected]> * Small syntax fix Signed-off-by: Valerie Sarge <[email protected]> * syntax Signed-off-by: Valerie Sarge <[email protected]> * Apply isort and black reformatting Signed-off-by: vysarge <[email protected]> * Correct ddp setting Signed-off-by: Valerie Sarge <[email protected]> * Fix hasattr check Signed-off-by: Valerie Sarge <[email protected]> * bf16 grad Signed-off-by: Valerie Sarge <[email protected]> * Update configs for 8b + 70b Signed-off-by: Valerie Sarge <[email protected]> * Set wgrad_deferral_limit Signed-off-by: Valerie Sarge <[email protected]> --------- Signed-off-by: Valerie Sarge <[email protected]> Signed-off-by: vysarge <[email protected]> Co-authored-by: vysarge <[email protected]>
What does this PR do ?
Add a one line overview of what this PR aims to accomplish.
Collection: llm
Changelog
Usage
# Add a code snippet demonstrating how to use this
GitHub Actions CI
The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.
The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information