The value of each transformerblock in the infer phase is particularly large #109

lgs00 · 2024-12-03T02:30:20Z

When I infer with cogvideoX-1.5-t2v, I found an interesting thing, after the processing ofself.norm1 = CogVideoXLayerNormZero, the value of norm_hidden_states was particularly large. What is the reason? In this line of diffusers:(https://github.com/huggingface/diffusers/blob/30f2e9bd202c89bb3862c8ada470d0d1ac8ee0e5/src/diffusers/models/transformers/cogvideox_transformer_3d.py#L127)
norm_hidden_states, norm_encoder_hidden_states, gate_msa, enc_gate_msa = self.norm1(hidden_states, encoder_hidden_states, temb)

The text was updated successfully, but these errors were encountered:

a-r-r-o-w · 2024-12-03T05:17:08Z

The actual normalization happens in an internal layer where the values are small, but are then scaled and shifted (adaptive layer norm)

https://github.com/huggingface/diffusers/blob/fc72e0f2616ff993733eaa0310f0253646e0c525/src/diffusers/models/normalization.py#L483

This is from the original implementation and not particular to any changes in the diffusers implementation, so this is expected.

lgs00 · 2024-12-03T06:09:51Z

The actual normalization happens in an internal layer where the values are small, but are then scaled and shifted (adaptive layer norm)

https://github.com/huggingface/diffusers/blob/fc72e0f2616ff993733eaa0310f0253646e0c525/src/diffusers/models/normalization.py#L483

This is from the original implementation and not particular to any changes in the diffusers implementation, so this is expected.

Thank you for your answer, but when I infer at cogvideoX1.0-5b, norm_hidden_states look normal. I wonder why the scale shrinks a lot in 1.5, while the result of 1.0 is more consistent with our cognition. Curious what causes this change, norm_hidden_states shouldn't normally be above 100.
The result of 1.0 is as follows:

a-r-r-o-w · 2024-12-03T06:35:22Z

I think it is more well suited to ask this in the original CogVideo repo because it probably stems from training, and the model authors might be able to provide a better answer

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The value of each transformerblock in the infer phase is particularly large #109

The value of each transformerblock in the infer phase is particularly large #109

lgs00 commented Dec 3, 2024

a-r-r-o-w commented Dec 3, 2024 •

edited

Loading

lgs00 commented Dec 3, 2024

a-r-r-o-w commented Dec 3, 2024

The value of each transformerblock in the infer phase is particularly large #109

The value of each transformerblock in the infer phase is particularly large #109

Comments

lgs00 commented Dec 3, 2024

a-r-r-o-w commented Dec 3, 2024 • edited Loading

lgs00 commented Dec 3, 2024

a-r-r-o-w commented Dec 3, 2024

a-r-r-o-w commented Dec 3, 2024 •

edited

Loading