You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is from the original implementation and not particular to any changes in the diffusers implementation, so this is expected.
Thank you for your answer, but when I infer at cogvideoX1.0-5b, norm_hidden_states look normal. I wonder why the scale shrinks a lot in 1.5, while the result of 1.0 is more consistent with our cognition. Curious what causes this change, norm_hidden_states shouldn't normally be above 100.
The result of 1.0 is as follows:
I think it is more well suited to ask this in the original CogVideo repo because it probably stems from training, and the model authors might be able to provide a better answer
When I infer with cogvideoX-1.5-t2v, I found an interesting thing, after the processing of
self.norm1 = CogVideoXLayerNormZero
, the value of norm_hidden_states was particularly large. What is the reason? In this line of diffusers:(https://github.com/huggingface/diffusers/blob/30f2e9bd202c89bb3862c8ada470d0d1ac8ee0e5/src/diffusers/models/transformers/cogvideox_transformer_3d.py#L127)norm_hidden_states, norm_encoder_hidden_states, gate_msa, enc_gate_msa = self.norm1(hidden_states, encoder_hidden_states, temb)
The text was updated successfully, but these errors were encountered: