-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QUESTION] Splitting big models over multiple GPUs #207
Comments
same question here |
Last time I check this was not very easy to do with pytorch-lightning. We actually used a custom made implementation with FSDP to train these larger models (without using pytorch-lightning). I have to double check if the new versions support FSDP better than the currently used pytorch lightning version (2.2.0.post0). But short answer: model parallelism is not something we are supporting in the current codebase. |
idea here. Ctranslate2 just integrated tensor parallelism. It also support XMLRoberta, so just wondering if we could adapt a bit the converter so that we could run the model within CT2 which is very fast. |
Does it support XLM-R XL? the architecture also differs from XLM-R |
It seems like they improved documentation a lot actually: https://lightning.ai/docs/pytorch/stable/advanced/model_parallel/fsdp.html |
we can adapt if we have the detailed description somewhere. |
When specifying the number of GPUs during inference, is it only for parallelism or is the model loaded piece-wise over multiple GPUs, if it's bigger than individual GPUs? For example I'd like to use XCOMET-XXL and our cluster has many 12GB GPUs.
At first I thought that the model parts will be loaded onto all GPUs, e.g.:
However I'm getting GPU OOM on the first GPU:
Thank you!
unbabel-comet 2.2.1
pytorch-lightning 2.2.0.post0
torch 2.2.1
The text was updated successfully, but these errors were encountered: