You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for your excellent work and elegant code!
I noticed that your current implementation already supports various parallelism strategies, but if you could additionally integrate DeepSpeed-Ulysses with head dimensional splitting, it would form a 5D Parallelism scheme. This would further enhance the scalability and efficiency of training large models.
Hello, thanks for sharing ! We will discuss internally whether we want to expand to 5D
For now, the way @zzhhjjj and I see things are as follow: all interesting features like the one you mentioned should live in a fork version of Picotron so that we avoid bloating the core in the long run (pretty much like nanogpt, people hack around on a fork). Once this feature is implemented, we will be glad to reference the fork in the README repository
Thank you for your excellent work and elegant code!
I noticed that your current implementation already supports various parallelism strategies, but if you could additionally integrate DeepSpeed-Ulysses with head dimensional splitting, it would form a 5D Parallelism scheme. This would further enhance the scalability and efficiency of training large models.
I recommend considering our Ulysses + Ring Unified Sequence Parallelism (USP) approach, as described in our paper USP: A Unified Sequence Parallelism Approach for Long Context Generative AI.
We have a Ulysses+Ring hybrid sequence parallel implementation here.
https://github.com/feifeibear/long-context-attention
The text was updated successfully, but these errors were encountered: