Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support sequential cpu offloading with torchao quantized tensors #3085

Merged

Conversation

a-r-r-o-w
Copy link
Member

@a-r-r-o-w a-r-r-o-w commented Sep 6, 2024

What does this PR do?

As discussed internally in https://huggingface.slack.com/archives/C068ZAHJZCZ/p1725454261264919

From the Diffusers-TorchAO experimentation, when modeling components are quantized and sequential cpu offloading is enabled, we get the following error:

Traceback
Traceback (most recent call last):
  File "/home/aryan/work/diffusers/dump10.py", line 228, in <module>
    pipe.enable_sequential_cpu_offload()
  File "/home/aryan/work/diffusers/src/diffusers/pipelines/pipeline_utils.py", line 1104, in enable_sequential_cpu_offload
    cpu_offload(model, device, offload_buffers=offload_buffers)
  File "/raid/aryan/nightly-venv/lib/python3.10/site-packages/accelerate/big_modeling.py", line 204, in cpu_offload
    attach_align_device_hook(
  File "/raid/aryan/nightly-venv/lib/python3.10/site-packages/accelerate/hooks.py", line 512, in attach_align_device_hook
    attach_align_device_hook(
  File "/raid/aryan/nightly-venv/lib/python3.10/site-packages/accelerate/hooks.py", line 512, in attach_align_device_hook
    attach_align_device_hook(
  File "/raid/aryan/nightly-venv/lib/python3.10/site-packages/accelerate/hooks.py", line 512, in attach_align_device_hook
    attach_align_device_hook(
  [Previous line repeated 4 more times]
  File "/raid/aryan/nightly-venv/lib/python3.10/site-packages/accelerate/hooks.py", line 503, in attach_align_device_hook
    add_hook_to_module(module, hook, append=True)
  File "/raid/aryan/nightly-venv/lib/python3.10/site-packages/accelerate/hooks.py", line 161, in add_hook_to_module
    module = hook.init_hook(module)
  File "/raid/aryan/nightly-venv/lib/python3.10/site-packages/accelerate/hooks.py", line 308, in init_hook
    set_module_tensor_to_device(module, name, "meta")
  File "/raid/aryan/nightly-venv/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 440, in set_module_tensor_to_device
    new_value = param_cls(new_value, requires_grad=old_value.requires_grad).to(device)
TypeError: AffineQuantizedTensor.__new__() got an unexpected keyword argument 'requires_grad'

MRE:

import torch
from diffusers import AutoencoderKLCogVideoX, CogVideoXTransformer3DModel, CogVideoXPipeline
from diffusers.utils import export_to_video
from transformers import T5EncoderModel
from torchao.quantization import quantize_, int8_weight_only

# Either "THUDM/CogVideoX-2b" or "THUDM/CogVideoX-5b"
model_id = "THUDM/CogVideoX-5b"
quantization = int8_weight_only

text_encoder = T5EncoderModel.from_pretrained(model_id, subfolder="text_encoder", torch_dtype=torch.bfloat16)
quantize_(text_encoder, quantization())

transformer = CogVideoXTransformer3DModel.from_pretrained(model_id, subfolder="transformer", torch_dtype=torch.bfloat16)
quantize_(transformer, quantization())

vae = AutoencoderKLCogVideoX.from_pretrained(model_id, subfolder="vae", torch_dtype=torch.bfloat16)
quantize_(vae, quantization())

pipe = CogVideoXPipeline.from_pretrained(
    model_id,
    text_encoder=text_encoder,
    transformer=transformer,
    vae=vae,
    torch_dtype=torch.bfloat16,
)
pipe.enable_sequential_cpu_offload()

video = pipe(
    prompt="a panda dancing",
    num_inference_steps=1,
).frames[0]

export_to_video(video, "output.mp4", fps=8)

This PR adds the discussed/proposed fix. On this branch, the memory usage is as follows for CogVideoX-5b:

memory=0.008
max_memory=3.081
max_reserved=3.908

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@SunMarc @jerryzh168 @sayakpaul

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ! Thanks !

@SunMarc SunMarc merged commit 5ad982a into huggingface:main Sep 6, 2024
25 checks passed
@a-r-r-o-w a-r-r-o-w deleted the torchao-diffusers-sequential-cpu-offload branch September 6, 2024 06:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants