Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DreamBooth: Using both "--train-text-encoder" and "--prior-preservation" results in RuntimeError #228

Closed
Xargonus opened this issue Aug 24, 2023 · 10 comments

Comments

@Xargonus
Copy link

Xargonus commented Aug 24, 2023

Both of the following commands work:

Not training the text encoder:

autotrain dreambooth \
  --model stabilityai/stable-diffusion-xl-base-1.0 \
  --output output/ \
  --image-path images/ \
  --class-image-path class_images \
  --prompt "a sks person" \
  --class-prompt "a person" \
  --num-class-images 200 \
  --prior-preservation \
  --resolution 1024 \
  --batch-size 1 \
  --num-steps 500 \
  --fp16 \
  --gradient-accumulation 4 \
  --lr 1e-4 \

Not using prior preservation:

autotrain dreambooth \
  --model stabilityai/stable-diffusion-xl-base-1.0 \
  --output output/ \
  --image-path images/ \
  --class-image-path class_images \
  --prompt "a sks person" \
  --class-prompt "a person" \
  --num-class-images 200 \
  --resolution 1024 \
  --batch-size 1 \
  --num-steps 500 \
  --fp16 \
  --gradient-accumulation 4 \
  --lr 1e-4 \
  --train-text-encoder

If I want to use both --train-text-encoder and prior-preservation, like this:

autotrain dreambooth \
  --model stabilityai/stable-diffusion-xl-base-1.0 \
  --output output/ \
  --image-path images/ \
  --class-image-path class_images \
  --prompt "a sks person" \
  --class-prompt "a person" \
  --num-class-images 200 \
  --prior-preservation \
  --resolution 1024 \
  --batch-size 1 \
  --num-steps 500 \
  --fp16 \
  --gradient-accumulation 4 \
  --lr 1e-4 \
  --train-text-encoder

I get the following RuntimeError in the first step of training:

Traceback (most recent call last):
  File "/home/user/miniconda3/envs/sdxl_training/bin/autotrain", line 8, in <module>
    sys.exit(main())
  File "/home/user/miniconda3/envs/sdxl_training/lib/python3.9/site-packages/autotrain/cli/autotrain.py", line 42, in main
    command.run()
  File "/home/user/miniconda3/envs/sdxl_training/lib/python3.9/site-packages/autotrain/cli/run_dreambooth.py", line 468, in run
    train_dreambooth(params)
  File "/home/user/miniconda3/envs/sdxl_training/lib/python3.9/site-packages/autotrain/trainers/dreambooth/main.py", line 249, in train
    trainer.train()
  File "/home/user/miniconda3/envs/sdxl_training/lib/python3.9/site-packages/autotrain/trainers/dreambooth/trainer.py", line 383, in train
    model_pred = self._get_model_pred(batch, channels, noisy_model_input, timesteps, bsz)
  File "/home/user/miniconda3/envs/sdxl_training/lib/python3.9/site-packages/autotrain/trainers/dreambooth/trainer.py", line 307, in _get_model_pred
    model_pred = self.unet(
  File "/home/user/miniconda3/envs/sdxl_training/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/miniconda3/envs/sdxl_training/lib/python3.9/site-packages/accelerate/utils/operations.py", line 632, in forward
    return model_forward(*args, **kwargs)
  File "/home/user/miniconda3/envs/sdxl_training/lib/python3.9/site-packages/accelerate/utils/operations.py", line 620, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "/home/user/miniconda3/envs/sdxl_training/lib/python3.9/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
    return func(*args, **kwargs)
  File "/home/user/miniconda3/envs/sdxl_training/lib/python3.9/site-packages/diffusers/models/unet_2d_condition.py", line 864, in forward
    aug_emb = self.add_embedding(add_embeds)
  File "/home/user/miniconda3/envs/sdxl_training/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/miniconda3/envs/sdxl_training/lib/python3.9/site-packages/diffusers/models/embeddings.py", line 192, in forward
    sample = self.linear_1(sample)
  File "/home/user/miniconda3/envs/sdxl_training/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/miniconda3/envs/sdxl_training/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (4x2048 and 2816x1280)
@Xargonus Xargonus changed the title Using both "--train-text-encoder" and "--prior-preservation" results in RuntimeError DreamBooth: Using both "--train-text-encoder" and "--prior-preservation" results in RuntimeError Aug 24, 2023
@marcd123
Copy link
Contributor

marcd123 commented Aug 28, 2023

I am also facing this issue. When fine-tuning SDXL 1.0, I have no errors without --prior-preservation and --train-text-encoder. When I try to train with them both enabled I face the same matrix multiply issue, same dimensions:

RuntimeError: mat1 and mat2 shapes cannot be multiplied (4x2048 and 2816x1280)

It originates from trainers/dreambooth/trainer.py train() and on line 383 calls _get_model_pred() which eventually calls self.unet() on line 307 which results in the error lower in a torch forward method:

  File "/usr/local/lib/python3.10/dist-packages/autotrain/trainers/dreambooth/trainer.py", line 383, in train
    model_pred = self._get_model_pred(batch, channels, noisy_model_input, timesteps, bsz)
  File "/usr/local/lib/python3.10/dist-packages/autotrain/trainers/dreambooth/trainer.py", line 307, in _get_model_pred
    model_pred = self.unet(

My hyperparameters:

learning_rate = 1e-6 # @param {type:"number"}
num_steps = 1200 #@param {type:"number"}
batch_size = 1 # @param {type:"slider", min:1, max:32, step:1}
checkpointing_steps = 200 #@param {type:"number"}
gradient_accumulation = 4 # @param {type:"slider", min:1, max:32, step:1}
resolution = 1024 # @param {type:"slider", min:128, max:1024, step:128}
use_8bit_adam = True # @param ["False", "True"] {type:"raw"}
use_xformers = True # @param ["False", "True"] {type:"raw"}
use_fp16 = True # @param ["False", "True"] {type:"raw"}
train_text_encoder = True # @param ["False", "True"] {type:"raw"}
use_prior_preserving_loss = True # @param ["False", "True"] {type:"raw"}
num_class_images = 200 #@param {type:"number"}
gradient_checkpointing = True # @param ["False", "True"] {type:"raw"}

My autotrain script:

# Trains the BASE SDXL
!autotrain dreambooth \
--model ${MODEL_NAME} \
--output ${PROJECT_NAME} \
--image-path "${INSTANCE_FOLDER_PATH}" \
--prompt "${INSTANCE_PROMPT}" \
$( [[ "$USE_PRIOR_PRESERVING_LOSS" == "True" ]] && echo '--prior-preservation' ) \
$( [[ "$USE_PRIOR_PRESERVING_LOSS" == "True" ]] && echo "--class-image-path ${CLASS_FOLDER_PATH}" ) \
--class-prompt "photo of a person" \
$( [[ "$USE_PRIOR_PRESERVING_LOSS" == "True" ]] && echo "--num-class-images ${NUM_CLASS_IMAGES}" ) \
--checkpointing-steps ${CHECKPOINTING_STEPS} \
--resolution ${RESOLUTION} \
--batch-size ${BATCH_SIZE} \
--num-steps ${NUM_STEPS} \
--gradient-accumulation ${GRADIENT_ACCUMULATION} \
--lr ${LEARNING_RATE} \
$( [[ "$USE_FP16" == "True" ]] && echo "--fp16" ) \
$( [[ "$USE_XFORMERS" == "True" ]] && echo "--xformers" ) \
$( [[ "$TRAIN_TEXT_ENCODER" == "True" ]] && echo "--train-text-encoder" ) \
$( [[ "$USE_8BIT_ADAM" == "True" ]] && echo "--use-8bit-adam" ) \
$( [[ "$GRADIENT_CHECKPOINTING" == "True" ]] && echo "--gradient-checkpointing" ) \
$( [[ "$PUSH_TO_HUB" == "True" ]] && echo "--push-to-hub --hub-token ${HF_TOKEN} --hub-model-id ${REPO_ID}" )

@marcd123
Copy link
Contributor

It appears that the issue is related to using both --train-text-encoder and --prior-preservation at the same time.

Along the stack trace, this is where those two options seem to overlap near the matrix multiply error. In _get_model_pred() of dreambooth/trainer.py - the bsz value is used to set the elems_to_repeat value which effects several other repeat calls in the rest of the function, and if you have --prior-preservation enabled then we're actually dividing bsz by two instead:

image

Also I notice this here is the one time bsz is used to set the number of repeats directly rather than using elems_to_repeat. Should this potentially be using elems_to_repeat instead?

image

@marcd123
Copy link
Contributor

As I suspected above, there appears to be a typo in the implementation of _get_model_pred. I referenced the Diffuser's library example for fine-tune SDXL Lora, and they have code similar to _get_model_red, but the repeat condition is different in one place.

I created a fork and modified it in one place and did my training run again, now getting passed that error!

main...marcd123:autotrain-advanced:main

image

@Xargonus you can try my fork by pip uninstalling autotrain and then installing my fork:
!pip install --upgrade --no-deps --force-reinstall git+https://github.com/marcd123/autotrain-advanced.git

@marcd123
Copy link
Contributor

PR: #245

@DarkAlchy
Copy link

Shows as merged YET the issue remains as it bit me yesterday.

@abhishekkrthakur
Copy link
Member

will create a release in sometime today!

@abhishekkrthakur
Copy link
Member

available from 0.6.36+

@DarkAlchy
Copy link

When will that be available as only .35 is right now?

@abhishekkrthakur
Copy link
Member

sorry, 35 works!

@DarkAlchy
Copy link

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants