You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
npuichigo
changed the title
[BUG] zero_to_fp32.py consolidated weights are all zero after this commit
[BUG][IMPORTANT] zero_to_fp32.py consolidated weights are all zero after this commit
Nov 26, 2024
I think it will affect many models. Let's just see the code here, the state_dict passed in to to_torch_tensor is a shallow-copy. state_dict[name] = torch.empty(tensor.shape, dtype=tensor.dtype) will override the original weights read from checkpoint.
Describe the bug
After this commit dd40269, specify
max_shard_size
to zero_to_fp32.py would generate empty weightsTo Reproduce
Use deepspeed v0.16.0
Reason
According to the code
DeepSpeed/deepspeed/utils/zero_to_fp32.py
Lines 513 to 529 in f743fec
The text was updated successfully, but these errors were encountered: