DreamBooth: Using both "--train-text-encoder" and "--prior-preservation" results in RuntimeError #228

Xargonus · 2023-08-24T10:06:17Z

Both of the following commands work:

Not training the text encoder:

autotrain dreambooth \
  --model stabilityai/stable-diffusion-xl-base-1.0 \
  --output output/ \
  --image-path images/ \
  --class-image-path class_images \
  --prompt "a sks person" \
  --class-prompt "a person" \
  --num-class-images 200 \
  --prior-preservation \
  --resolution 1024 \
  --batch-size 1 \
  --num-steps 500 \
  --fp16 \
  --gradient-accumulation 4 \
  --lr 1e-4 \

Not using prior preservation:

autotrain dreambooth \
  --model stabilityai/stable-diffusion-xl-base-1.0 \
  --output output/ \
  --image-path images/ \
  --class-image-path class_images \
  --prompt "a sks person" \
  --class-prompt "a person" \
  --num-class-images 200 \
  --resolution 1024 \
  --batch-size 1 \
  --num-steps 500 \
  --fp16 \
  --gradient-accumulation 4 \
  --lr 1e-4 \
  --train-text-encoder

If I want to use both --train-text-encoder and prior-preservation, like this:

autotrain dreambooth \
  --model stabilityai/stable-diffusion-xl-base-1.0 \
  --output output/ \
  --image-path images/ \
  --class-image-path class_images \
  --prompt "a sks person" \
  --class-prompt "a person" \
  --num-class-images 200 \
  --prior-preservation \
  --resolution 1024 \
  --batch-size 1 \
  --num-steps 500 \
  --fp16 \
  --gradient-accumulation 4 \
  --lr 1e-4 \
  --train-text-encoder

I get the following RuntimeError in the first step of training:

Traceback (most recent call last):
  File "/home/user/miniconda3/envs/sdxl_training/bin/autotrain", line 8, in <module>
    sys.exit(main())
  File "/home/user/miniconda3/envs/sdxl_training/lib/python3.9/site-packages/autotrain/cli/autotrain.py", line 42, in main
    command.run()
  File "/home/user/miniconda3/envs/sdxl_training/lib/python3.9/site-packages/autotrain/cli/run_dreambooth.py", line 468, in run
    train_dreambooth(params)
  File "/home/user/miniconda3/envs/sdxl_training/lib/python3.9/site-packages/autotrain/trainers/dreambooth/main.py", line 249, in train
    trainer.train()
  File "/home/user/miniconda3/envs/sdxl_training/lib/python3.9/site-packages/autotrain/trainers/dreambooth/trainer.py", line 383, in train
    model_pred = self._get_model_pred(batch, channels, noisy_model_input, timesteps, bsz)
  File "/home/user/miniconda3/envs/sdxl_training/lib/python3.9/site-packages/autotrain/trainers/dreambooth/trainer.py", line 307, in _get_model_pred
    model_pred = self.unet(
  File "/home/user/miniconda3/envs/sdxl_training/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/miniconda3/envs/sdxl_training/lib/python3.9/site-packages/accelerate/utils/operations.py", line 632, in forward
    return model_forward(*args, **kwargs)
  File "/home/user/miniconda3/envs/sdxl_training/lib/python3.9/site-packages/accelerate/utils/operations.py", line 620, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "/home/user/miniconda3/envs/sdxl_training/lib/python3.9/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
    return func(*args, **kwargs)
  File "/home/user/miniconda3/envs/sdxl_training/lib/python3.9/site-packages/diffusers/models/unet_2d_condition.py", line 864, in forward
    aug_emb = self.add_embedding(add_embeds)
  File "/home/user/miniconda3/envs/sdxl_training/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/miniconda3/envs/sdxl_training/lib/python3.9/site-packages/diffusers/models/embeddings.py", line 192, in forward
    sample = self.linear_1(sample)
  File "/home/user/miniconda3/envs/sdxl_training/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/miniconda3/envs/sdxl_training/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (4x2048 and 2816x1280)

The text was updated successfully, but these errors were encountered:

marcd123 · 2023-08-28T20:30:37Z

I am also facing this issue. When fine-tuning SDXL 1.0, I have no errors without --prior-preservation and --train-text-encoder. When I try to train with them both enabled I face the same matrix multiply issue, same dimensions:

RuntimeError: mat1 and mat2 shapes cannot be multiplied (4x2048 and 2816x1280)

It originates from trainers/dreambooth/trainer.py train() and on line 383 calls _get_model_pred() which eventually calls self.unet() on line 307 which results in the error lower in a torch forward method:

  File "/usr/local/lib/python3.10/dist-packages/autotrain/trainers/dreambooth/trainer.py", line 383, in train
    model_pred = self._get_model_pred(batch, channels, noisy_model_input, timesteps, bsz)
  File "/usr/local/lib/python3.10/dist-packages/autotrain/trainers/dreambooth/trainer.py", line 307, in _get_model_pred
    model_pred = self.unet(

My hyperparameters:

learning_rate = 1e-6 # @param {type:"number"}
num_steps = 1200 #@param {type:"number"}
batch_size = 1 # @param {type:"slider", min:1, max:32, step:1}
checkpointing_steps = 200 #@param {type:"number"}
gradient_accumulation = 4 # @param {type:"slider", min:1, max:32, step:1}
resolution = 1024 # @param {type:"slider", min:128, max:1024, step:128}
use_8bit_adam = True # @param ["False", "True"] {type:"raw"}
use_xformers = True # @param ["False", "True"] {type:"raw"}
use_fp16 = True # @param ["False", "True"] {type:"raw"}
train_text_encoder = True # @param ["False", "True"] {type:"raw"}
use_prior_preserving_loss = True # @param ["False", "True"] {type:"raw"}
num_class_images = 200 #@param {type:"number"}
gradient_checkpointing = True # @param ["False", "True"] {type:"raw"}

My autotrain script:

# Trains the BASE SDXL
!autotrain dreambooth \
--model ${MODEL_NAME} \
--output ${PROJECT_NAME} \
--image-path "${INSTANCE_FOLDER_PATH}" \
--prompt "${INSTANCE_PROMPT}" \
$( [[ "$USE_PRIOR_PRESERVING_LOSS" == "True" ]] && echo '--prior-preservation' ) \
$( [[ "$USE_PRIOR_PRESERVING_LOSS" == "True" ]] && echo "--class-image-path ${CLASS_FOLDER_PATH}" ) \
--class-prompt "photo of a person" \
$( [[ "$USE_PRIOR_PRESERVING_LOSS" == "True" ]] && echo "--num-class-images ${NUM_CLASS_IMAGES}" ) \
--checkpointing-steps ${CHECKPOINTING_STEPS} \
--resolution ${RESOLUTION} \
--batch-size ${BATCH_SIZE} \
--num-steps ${NUM_STEPS} \
--gradient-accumulation ${GRADIENT_ACCUMULATION} \
--lr ${LEARNING_RATE} \
$( [[ "$USE_FP16" == "True" ]] && echo "--fp16" ) \
$( [[ "$USE_XFORMERS" == "True" ]] && echo "--xformers" ) \
$( [[ "$TRAIN_TEXT_ENCODER" == "True" ]] && echo "--train-text-encoder" ) \
$( [[ "$USE_8BIT_ADAM" == "True" ]] && echo "--use-8bit-adam" ) \
$( [[ "$GRADIENT_CHECKPOINTING" == "True" ]] && echo "--gradient-checkpointing" ) \
$( [[ "$PUSH_TO_HUB" == "True" ]] && echo "--push-to-hub --hub-token ${HF_TOKEN} --hub-model-id ${REPO_ID}" )

marcd123 · 2023-08-28T22:27:53Z

It appears that the issue is related to using both --train-text-encoder and --prior-preservation at the same time.

Along the stack trace, this is where those two options seem to overlap near the matrix multiply error. In _get_model_pred() of dreambooth/trainer.py - the bsz value is used to set the elems_to_repeat value which effects several other repeat calls in the rest of the function, and if you have --prior-preservation enabled then we're actually dividing bsz by two instead:

Also I notice this here is the one time bsz is used to set the number of repeats directly rather than using elems_to_repeat. Should this potentially be using elems_to_repeat instead?

marcd123 · 2023-08-31T00:06:56Z

As I suspected above, there appears to be a typo in the implementation of _get_model_pred. I referenced the Diffuser's library example for fine-tune SDXL Lora, and they have code similar to _get_model_red, but the repeat condition is different in one place.

I created a fork and modified it in one place and did my training run again, now getting passed that error!

main...marcd123:autotrain-advanced:main

@Xargonus you can try my fork by pip uninstalling autotrain and then installing my fork:
!pip install --upgrade --no-deps --force-reinstall git+https://github.com/marcd123/autotrain-advanced.git

marcd123 · 2023-08-31T00:09:14Z

PR: #245

DarkAlchy · 2023-10-05T12:21:34Z

Shows as merged YET the issue remains as it bit me yesterday.

abhishekkrthakur · 2023-10-05T12:23:31Z

will create a release in sometime today!

abhishekkrthakur · 2023-10-05T12:34:55Z

available from 0.6.36+

DarkAlchy · 2023-10-05T13:11:08Z

When will that be available as only .35 is right now?

abhishekkrthakur · 2023-10-05T13:15:06Z

sorry, 35 works!

DarkAlchy · 2023-10-05T13:16:56Z

Thank you.

Xargonus changed the title ~~Using both "--train-text-encoder" and "--prior-preservation" results in RuntimeError~~ DreamBooth: Using both "--train-text-encoder" and "--prior-preservation" results in RuntimeError Aug 24, 2023

marcd123 mentioned this issue Aug 31, 2023

Enable using both --train-text-encoder and --use-prior-preservation #245

Merged

abhishekkrthakur closed this as completed Dec 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DreamBooth: Using both "--train-text-encoder" and "--prior-preservation" results in RuntimeError #228

DreamBooth: Using both "--train-text-encoder" and "--prior-preservation" results in RuntimeError #228

Xargonus commented Aug 24, 2023 •

edited

Loading

marcd123 commented Aug 28, 2023 •

edited

Loading

marcd123 commented Aug 28, 2023

marcd123 commented Aug 31, 2023

marcd123 commented Aug 31, 2023

DarkAlchy commented Oct 5, 2023

abhishekkrthakur commented Oct 5, 2023

abhishekkrthakur commented Oct 5, 2023

DarkAlchy commented Oct 5, 2023

abhishekkrthakur commented Oct 5, 2023

DarkAlchy commented Oct 5, 2023

DreamBooth: Using both "--train-text-encoder" and "--prior-preservation" results in RuntimeError #228

DreamBooth: Using both "--train-text-encoder" and "--prior-preservation" results in RuntimeError #228

Comments

Xargonus commented Aug 24, 2023 • edited Loading

marcd123 commented Aug 28, 2023 • edited Loading

marcd123 commented Aug 28, 2023

marcd123 commented Aug 31, 2023

marcd123 commented Aug 31, 2023

DarkAlchy commented Oct 5, 2023

abhishekkrthakur commented Oct 5, 2023

abhishekkrthakur commented Oct 5, 2023

DarkAlchy commented Oct 5, 2023

abhishekkrthakur commented Oct 5, 2023

DarkAlchy commented Oct 5, 2023

Xargonus commented Aug 24, 2023 •

edited

Loading

marcd123 commented Aug 28, 2023 •

edited

Loading