support torch dynamo for deepspeed>=0.14.4 #3069

oraluben · 2024-09-03T02:18:16Z

What does this PR do?

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

HuggingFaceDocBuilderDev · 2024-09-07T14:28:40Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

muellerzr

Nice! Would it be possible to add a small test in tests/deepspeed/test_deepspeed.py? Single GPU should be good enough. Thanks!

cc @SunMarc to keep an eye on since it's compile related

oraluben · 2024-09-09T15:29:53Z

This is a untested test, I'll run it tomorrow and see if it's really working.

SunMarc

Thanks for the PR @oraluben ! LGTM ! Just nit, you will need to update the deepspeed_launcher to account for dynamo args, just like @pacman100 suggested before #2460 (comment). This will probably be needed in order to pass the tests you added.

Also, I tried launching the script that @pacman100 shared in the previous PR with the following setup:

inductor + TORCHDYNAMO_DEBUG_FUNCTION=forward : same speed as without dynamo
inductor + without TORCHDYNAMO_DEBUG_FUNCTION=forward : slow at the beginning then then ~ same iteration speed as the first one.
without dynamo

Did you do any benchmark on your side ? That would be nice to have an example which shows speed increase.

SunMarc · 2024-09-09T15:45:27Z

Reproduction shared here

Traceback:

inductor + without TORCHDYNAMO_DEBUG_FUNCTION=forward

 33%|███████████████████████▎                                              | 229/687 [02:42<01:26,  5.31it/s]Training Accuracy for backend inductor at epoch 0: {'accuracy': 0.7248908296943232, 'f1': 0.8132641719155241}
Training Accuracy for backend inductor at epoch 0: {'accuracy': 0.7248908296943232, 'f1': 0.8132641719155241}
 67%|██████████████████████████████████████████████▋                       | 458/687 [03:28<00:45,  5.05it/s]Training Accuracy for backend inductor at epoch 1: {'accuracy': 0.87882096069869, 'f1': 0.9104477611940298}
Training Accuracy for backend inductor at epoch 1: {'accuracy': 0.87882096069869, 'f1': 0.9104477611940298}
100%|██████████████████████████████████████████████████████████████████████| 687/687 [04:14<00:00,  5.19it/s]Training Accuracy for backend inductor at epoch 2: {'accuracy': 0.9658842794759825, 'f1': 0.9748136207938748}
Training finished.
First iteration took: 40.36s
Average time after the first iteration: 311.71ms
Training Accuracy for backend inductor at epoch 2: {'accuracy': 0.9658842794759825, 'f1': 0.9748136207938748}
Training finished.
First iteration took: 40.63s
Average time after the first iteration: 311.71ms

inductor + with TORCHDYNAMO_DEBUG_FUNCTION=forward

Training Accuracy for backend inductor at epoch 0: {'accuracy': 0.724617903930131, 'f1': 0.81276674707738}
Training Accuracy for backend inductor at epoch 0: {'accuracy': 0.724617903930131, 'f1': 0.81276674707738}
 67%|██████████████████████████████████████████████▋                       | 458/687 [01:52<00:40,  5.61it/s]Training Accuracy for backend inductor at epoch 1: {'accuracy': 0.8820960698689956, 'f1': 0.9130084575110753}
Training Accuracy for backend inductor at epoch 1: {'accuracy': 0.8820960698689956, 'f1': 0.9130084575110753}
100%|██████████████████████████████████████████████████████████████████████| 687/687 [02:35<00:00,  5.55it/s]Training Accuracy for backend inductor at epoch 2: {'accuracy': 0.9598799126637555, 'f1': 0.9704165828134435}
Training finished.
First iteration took: 28.05s
Average time after the first iteration: 186.44ms
Training Accuracy for backend inductor at epoch 2: {'accuracy': 0.9598799126637555, 'f1': 0.9704165828134435}
Training finished.
First iteration took: 27.78s
Average time after the first iteration: 186.44ms

without dynamo

33%|███████████████████████▎                                              | 229/687 [00:51<01:21,  5.61it/s]Training Accuracy for backend no at epoch 0: {'accuracy': 0.7254366812227074, 'f1': 0.8123834390152929}
Training Accuracy for backend no at epoch 0: {'accuracy': 0.7254366812227074, 'f1': 0.8123834390152929}
 67%|██████████████████████████████████████████████▋                       | 458/687 [01:33<00:40,  5.62it/s]Training Accuracy for backend no at epoch 1: {'accuracy': 0.8815502183406113, 'f1': 0.9126409017713366}
Training Accuracy for backend no at epoch 1: {'accuracy': 0.8815502183406113, 'f1': 0.9126409017713366}
100%|██████████████████████████████████████████████████████████████████████| 687/687 [02:15<00:00,  5.61it/s]Training Accuracy for backend no at epoch 2: {'accuracy': 0.9639737991266376, 'f1': 0.9734513274336283}
Training Accuracy for backend no at epoch 2: {'accuracy': 0.9639737991266376, 'f1': 0.9734513274336283}Training finished.

First iteration took: 10.38sTraining finished.

Average time after the first iteration: 183.68ms
First iteration took: 9.95s
Average time after the first iteration: 183.68ms

oraluben · 2024-09-10T02:05:43Z

Did you do any benchmark on your side ? That would be nice to have an example which shows speed increase.

We've seen improvement in compiled llama. My guess is the demo in test_performance.py contains ops that dynamo do not support e.g. assertions, so dynamo does not show significant speedup.

SunMarc · 2024-09-10T11:45:33Z

You mean the text_classification.py script ? This is a rather standard tranining script.
Do you have an example of DP + Dynamo + Llama that I can try to reproduce ? Thanks !
This is not a blocker for this PR but just making sure that everything works well on our side.

oraluben · 2024-09-10T13:17:14Z

Do you have an example of DP + Dynamo + Llama that I can try to reproduce ? Thanks !

Sure, I'd love to share, maybe later this week or next week, I'm a little busy now 🫠

oraluben · 2024-09-12T06:13:07Z

https://gist.github.com/oraluben/9b8240c2fe482eb4382453d6c97a5f76

TLDR: a ~10% speedup in llama, but some deepspeed patch is required

@SunMarc @muellerzr

update: I just realized that this do not use accelerate directly, it's a transformers-based demo, is that good for you?

SunMarc · 2024-09-12T12:42:43Z

That good enough @oraluben ! Thanks for the nice reproducer =)

oraluben · 2024-09-12T13:00:48Z

I would prefer to not checkin the current version of test, for it does not represent the best practice to combine accelerate with deepspeed and torch dynamo (and not even working right now). Based on my demo, do you have some idea for a better test?

github-actions · 2024-10-07T15:07:01Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

muellerzr · 2024-10-10T16:49:30Z

@SunMarc if you're comfortable with this here I'm comfortable and we can go test-less for now

SunMarc · 2024-10-10T16:53:05Z

Yeah for sure let's merge this as this shouldn't affect users in general. I'll test this later !

oraluben marked this pull request as draft September 3, 2024 02:18

oraluben mentioned this pull request Sep 3, 2024

Support deepspeed dynamo #2460

Closed

5 tasks

oraluben marked this pull request as ready for review September 7, 2024 14:02

muellerzr approved these changes Sep 9, 2024

View reviewed changes

oraluben force-pushed the support-deepspeed-dynamo branch from 225c2a5 to 1ce1ede Compare September 9, 2024 15:30

SunMarc approved these changes Sep 9, 2024

View reviewed changes

oraluben mentioned this pull request Sep 9, 2024

Handle when backend is also in compile_kwargs microsoft/DeepSpeed#6502

Merged

oraluben and others added 4 commits September 12, 2024 12:53

compile after deepspeed 0.14.4

32f7973

fix

fb0d3b5

fmt

c5da4b7

add test

b126ddc

oraluben force-pushed the support-deepspeed-dynamo branch from 1ce1ede to b126ddc Compare September 12, 2024 04:54

SunMarc merged commit cba3f2d into huggingface:main Oct 10, 2024
25 checks passed

oraluben deleted the support-deepspeed-dynamo branch October 11, 2024 02:58

oraluben mentioned this pull request Oct 11, 2024

Remove broken dynamo test #3155

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support torch dynamo for deepspeed>=0.14.4 #3069

support torch dynamo for deepspeed>=0.14.4 #3069

oraluben commented Sep 3, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Sep 7, 2024

muellerzr left a comment

oraluben commented Sep 9, 2024

SunMarc left a comment •

edited

Loading

SunMarc commented Sep 9, 2024 •

edited

Loading

oraluben commented Sep 10, 2024

SunMarc commented Sep 10, 2024

oraluben commented Sep 10, 2024

oraluben commented Sep 12, 2024 •

edited

Loading

SunMarc commented Sep 12, 2024

oraluben commented Sep 12, 2024

github-actions bot commented Oct 7, 2024

muellerzr commented Oct 10, 2024

SunMarc commented Oct 10, 2024 •

edited

Loading

support torch dynamo for deepspeed>=0.14.4 #3069

support torch dynamo for deepspeed>=0.14.4 #3069

Conversation

oraluben commented Sep 3, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

HuggingFaceDocBuilderDev commented Sep 7, 2024

muellerzr left a comment

Choose a reason for hiding this comment

oraluben commented Sep 9, 2024

SunMarc left a comment • edited Loading

Choose a reason for hiding this comment

SunMarc commented Sep 9, 2024 • edited Loading

oraluben commented Sep 10, 2024

SunMarc commented Sep 10, 2024

oraluben commented Sep 10, 2024

oraluben commented Sep 12, 2024 • edited Loading

SunMarc commented Sep 12, 2024

oraluben commented Sep 12, 2024

github-actions bot commented Oct 7, 2024

muellerzr commented Oct 10, 2024

SunMarc commented Oct 10, 2024 • edited Loading

oraluben commented Sep 3, 2024 •

edited

Loading

SunMarc left a comment •

edited

Loading

SunMarc commented Sep 9, 2024 •

edited

Loading

oraluben commented Sep 12, 2024 •

edited

Loading

SunMarc commented Oct 10, 2024 •

edited

Loading