Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: Integrate DeepSpeed-Ulysses with Head Dimensional Splitting to Form a 5D Parallelism Scheme #12

Open
feifeibear opened this issue Dec 19, 2024 · 1 comment
Labels
features New features

Comments

@feifeibear
Copy link

Thank you for your excellent work and elegant code!

I noticed that your current implementation already supports various parallelism strategies, but if you could additionally integrate DeepSpeed-Ulysses with head dimensional splitting, it would form a 5D Parallelism scheme. This would further enhance the scalability and efficiency of training large models.

I recommend considering our Ulysses + Ring Unified Sequence Parallelism (USP) approach, as described in our paper USP: A Unified Sequence Parallelism Approach for Long Context Generative AI.

We have a Ulysses+Ring hybrid sequence parallel implementation here.

https://github.com/feifeibear/long-context-attention

@3outeille
Copy link
Member

3outeille commented Dec 19, 2024

Hello, thanks for sharing ! We will discuss internally whether we want to expand to 5D

For now, the way @zzhhjjj and I see things are as follow: all interesting features like the one you mentioned should live in a fork version of Picotron so that we avoid bloating the core in the long run (pretty much like nanogpt, people hack around on a fork). Once this feature is implemented, we will be glad to reference the fork in the README repository

@3outeille 3outeille added enhancement New feature or request features New features and removed enhancement New feature or request labels Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
features New features
Projects
None yet
Development

No branches or pull requests

2 participants