Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support torch dynamo for deepspeed>=0.14.4 #3069

Merged
merged 4 commits into from
Oct 10, 2024

Conversation

oraluben
Copy link
Contributor

@oraluben oraluben commented Sep 3, 2024

What does this PR do?

Fixes # (issue)

Related: microsoft/DeepSpeed#6502

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@oraluben oraluben marked this pull request as draft September 3, 2024 02:18
@oraluben oraluben mentioned this pull request Sep 3, 2024
5 tasks
@oraluben oraluben marked this pull request as ready for review September 7, 2024 14:02
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@muellerzr muellerzr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Would it be possible to add a small test in tests/deepspeed/test_deepspeed.py? Single GPU should be good enough. Thanks!

cc @SunMarc to keep an eye on since it's compile related

@oraluben
Copy link
Contributor Author

oraluben commented Sep 9, 2024

This is a untested test, I'll run it tomorrow and see if it's really working.

Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @oraluben ! LGTM ! Just nit, you will need to update the deepspeed_launcher to account for dynamo args, just like @pacman100 suggested before #2460 (comment). This will probably be needed in order to pass the tests you added.

Also, I tried launching the script that @pacman100 shared in the previous PR with the following setup:

  • inductor + TORCHDYNAMO_DEBUG_FUNCTION=forward : same speed as without dynamo
  • inductor + without TORCHDYNAMO_DEBUG_FUNCTION=forward : slow at the beginning then then ~ same iteration speed as the first one.
  • without dynamo

Did you do any benchmark on your side ? That would be nice to have an example which shows speed increase.

@SunMarc
Copy link
Member

SunMarc commented Sep 9, 2024

Reproduction shared here

Traceback:

  • inductor + without TORCHDYNAMO_DEBUG_FUNCTION=forward
 33%|███████████████████████▎                                              | 229/687 [02:42<01:26,  5.31it/s]Training Accuracy for backend inductor at epoch 0: {'accuracy': 0.7248908296943232, 'f1': 0.8132641719155241}
Training Accuracy for backend inductor at epoch 0: {'accuracy': 0.7248908296943232, 'f1': 0.8132641719155241}
 67%|██████████████████████████████████████████████▋                       | 458/687 [03:28<00:45,  5.05it/s]Training Accuracy for backend inductor at epoch 1: {'accuracy': 0.87882096069869, 'f1': 0.9104477611940298}
Training Accuracy for backend inductor at epoch 1: {'accuracy': 0.87882096069869, 'f1': 0.9104477611940298}
100%|██████████████████████████████████████████████████████████████████████| 687/687 [04:14<00:00,  5.19it/s]Training Accuracy for backend inductor at epoch 2: {'accuracy': 0.9658842794759825, 'f1': 0.9748136207938748}
Training finished.
First iteration took: 40.36s
Average time after the first iteration: 311.71ms
Training Accuracy for backend inductor at epoch 2: {'accuracy': 0.9658842794759825, 'f1': 0.9748136207938748}
Training finished.
First iteration took: 40.63s
Average time after the first iteration: 311.71ms
  • inductor + with TORCHDYNAMO_DEBUG_FUNCTION=forward
Training Accuracy for backend inductor at epoch 0: {'accuracy': 0.724617903930131, 'f1': 0.81276674707738}
Training Accuracy for backend inductor at epoch 0: {'accuracy': 0.724617903930131, 'f1': 0.81276674707738}
 67%|██████████████████████████████████████████████▋                       | 458/687 [01:52<00:40,  5.61it/s]Training Accuracy for backend inductor at epoch 1: {'accuracy': 0.8820960698689956, 'f1': 0.9130084575110753}
Training Accuracy for backend inductor at epoch 1: {'accuracy': 0.8820960698689956, 'f1': 0.9130084575110753}
100%|██████████████████████████████████████████████████████████████████████| 687/687 [02:35<00:00,  5.55it/s]Training Accuracy for backend inductor at epoch 2: {'accuracy': 0.9598799126637555, 'f1': 0.9704165828134435}
Training finished.
First iteration took: 28.05s
Average time after the first iteration: 186.44ms
Training Accuracy for backend inductor at epoch 2: {'accuracy': 0.9598799126637555, 'f1': 0.9704165828134435}
Training finished.
First iteration took: 27.78s
Average time after the first iteration: 186.44ms
  • without dynamo
33%|███████████████████████▎                                              | 229/687 [00:51<01:21,  5.61it/s]Training Accuracy for backend no at epoch 0: {'accuracy': 0.7254366812227074, 'f1': 0.8123834390152929}
Training Accuracy for backend no at epoch 0: {'accuracy': 0.7254366812227074, 'f1': 0.8123834390152929}
 67%|██████████████████████████████████████████████▋                       | 458/687 [01:33<00:40,  5.62it/s]Training Accuracy for backend no at epoch 1: {'accuracy': 0.8815502183406113, 'f1': 0.9126409017713366}
Training Accuracy for backend no at epoch 1: {'accuracy': 0.8815502183406113, 'f1': 0.9126409017713366}
100%|██████████████████████████████████████████████████████████████████████| 687/687 [02:15<00:00,  5.61it/s]Training Accuracy for backend no at epoch 2: {'accuracy': 0.9639737991266376, 'f1': 0.9734513274336283}
Training Accuracy for backend no at epoch 2: {'accuracy': 0.9639737991266376, 'f1': 0.9734513274336283}Training finished.

First iteration took: 10.38sTraining finished.

Average time after the first iteration: 183.68ms
First iteration took: 9.95s
Average time after the first iteration: 183.68ms

@oraluben
Copy link
Contributor Author

Did you do any benchmark on your side ? That would be nice to have an example which shows speed increase.

We've seen improvement in compiled llama. My guess is the demo in test_performance.py contains ops that dynamo do not support e.g. assertions, so dynamo does not show significant speedup.

@SunMarc
Copy link
Member

SunMarc commented Sep 10, 2024

You mean the text_classification.py script ? This is a rather standard tranining script.
Do you have an example of DP + Dynamo + Llama that I can try to reproduce ? Thanks !
This is not a blocker for this PR but just making sure that everything works well on our side.

@oraluben
Copy link
Contributor Author

Do you have an example of DP + Dynamo + Llama that I can try to reproduce ? Thanks !

Sure, I'd love to share, maybe later this week or next week, I'm a little busy now 🫠

@oraluben
Copy link
Contributor Author

oraluben commented Sep 12, 2024

https://gist.github.com/oraluben/9b8240c2fe482eb4382453d6c97a5f76

TLDR: a ~10% speedup in llama, but some deepspeed patch is required

@SunMarc @muellerzr

update: I just realized that this do not use accelerate directly, it's a transformers-based demo, is that good for you?

@SunMarc
Copy link
Member

SunMarc commented Sep 12, 2024

That good enough @oraluben ! Thanks for the nice reproducer =)

@oraluben
Copy link
Contributor Author

I would prefer to not checkin the current version of test, for it does not represent the best practice to combine accelerate with deepspeed and torch dynamo (and not even working right now). Based on my demo, do you have some idea for a better test?

Copy link

github-actions bot commented Oct 7, 2024

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@muellerzr
Copy link
Collaborator

@SunMarc if you're comfortable with this here I'm comfortable and we can go test-less for now

@SunMarc
Copy link
Member

SunMarc commented Oct 10, 2024

Yeah for sure let's merge this as this shouldn't affect users in general. I'll test this later !

@SunMarc SunMarc merged commit cba3f2d into huggingface:main Oct 10, 2024
25 checks passed
@oraluben oraluben deleted the support-deepspeed-dynamo branch October 11, 2024 02:58
@oraluben oraluben mentioned this pull request Oct 11, 2024
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants