Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Support FSDP2 #3231

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

[RFC] Support FSDP2 #3231

wants to merge 2 commits into from

Conversation

kmehant
Copy link

@kmehant kmehant commented Nov 8, 2024

What does this PR do?

Prototype implementation for porting from FSDP V1 to FSDP V2. There are couple of open questions in this PR that would need comments and discussion.

  1. Do we want to maintain FSDP V1 as is and add a experimental parallel to FSDP V2?
  2. When we want to maintain 2 versions, should we maintain separate FSDP plugins and distributed types for each versions?
  3. For HF/transformers users, using fsdp_config, how we want to allow them to choose between these versions?
  4. How we want prepare 2D mesh for HSDP, should that be an input from user?

Preliminary run of this PR and results

The current version of the PR has been tested for basic functionality (full shard) and compared with previous FSDP V1 implementation.

Key Value
Model Maykeye/TinyLLama-v0
Mesh size 2 GPUs
sharding full shard

Memory

Screenshot 2024-11-09 at 12 50 10 AM

Loss Parity

Screenshot 2024-11-09 at 12 59 56 AM

Throughput

TODO

Fixes #2873

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@muellerzr

Signed-off-by: Mehant Kammakomati <[email protected]>
@raghukiran1224
Copy link

@ByronHsu FYI - thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Plan to support FSDP2?
2 participants