-
Notifications
You must be signed in to change notification settings - Fork 977
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error while fine tuning with peft, lora, accelerate, SFTConfig and SFTTrainer #3230
Comments
I'm pretty sure that if you use |
The last working code I have (for only 1 GPU) is that (only the model part for clarity):
I ran the code with 1 or 4 GPUs without any change (I run the code on a distant server where "slurm" is used so I can easily ask 1 or 4 GPUs for different tasks): 1 GPU output:
4 GPUs output:
We can see when I ask 1 GPU it will take 20h of training and when I use 4 GPUs it will take 18h so not a lot of differences. I would hope with 4 GPUs it will last almost 4X less time than with 1 GPU. Also we can see only one GPU is used when I ask 4 GPUs. So apparently SFTTrainer doesn't know how to use all GPUs when there is more than 1 GPU. I also tried that where people advise to add for when they use more than 1 GPU:
I have the same result, the train will approximatively take 18-20h like when I use only 1 GPU |
After several attempts trying different options, I noticed that my code is indeed using multiple GPUs, but I'm observing some strange behavior. Specifically, when I run my code with 1 GPU, it takes about 19.5hours. When using 4 GPUs, the time drops only slightly to 17.5hours. However, when using 2 GPUs, the runtime is significantly better, around 9 hours, which is actually faster than when I use 3 (13h) or 4 GPUs. The outputs I have: 4 GPU output (~17h30)
3 GPU output (~13h)
2 GPUs output (~9h)
1 GPU output (~19h30)
and this is the code used to show VRAM usage:
I'm trying to understand why my code performs best with 2 GPUs instead of 4. Additionally, based on my console outputs, it seems that only the first GPU is being used during trainer.train(). I'm wondering how I can verify the GPU utilization from my Python code, since I'm in an environment ( |
Glad that you got it running, but I'm not sure why you see the bad scaling behavior. Just minor issue I spotted in your code, but that's unlikely to be the cause: When you pass |
System Info
Information
Tasks
no_trainer
script in theexamples
folder of thetransformers
repo (such asrun_no_trainer_glue.py
)Reproduction
Hi,
I try to parallelize training on 4 GPU (v100 32GB VRAM). I have a working code for 1 GPU using lora, peft, SFTConfig and SFTTrainer. I tried to add some lines from accelerate (the lib) as I saw on some tutorials to achieve my goal without success.
This is the error I get (I get it 4 times due to the parallelization, but for more clarity, I put only one occurence):
This is my code (I don’t put all, just the code about the model itself for more clarity):
The only code I added between the 1 GPU and 4 GPU versions is:
And I run the code via a bash script:
The 1 GPU version of this script was:
python script.py --model_path $1 --output $2
I also tried at the end of the main function (deleting ‘model = accelerator.prepare_model(model)’):
But this time I have this error:
I tried to do some fixes as discussed on this https://discuss.huggingface.co/t/multiple-gpu-in-sfttrainer/91899
Unfortunately I still have some errors:
This is my code now:
I removed
and I modified my bash script:
Expected behavior
I would like to use accelerate to make my 1 GPU code working with more GPUs
The text was updated successfully, but these errors were encountered: