We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hello,
Thanks for your wonderful work. I am doing some testing with your code. However, I found a very strange problem.
I want to print the weight shape of lm_head (https://github.com/X-PLUG/mPLUG-Owl/blob/main/mPLUG-Owl2/mplug_owl2/model/modeling_mplug_owl2.py#L220) with the following codes.
print("Before initializing lm_head: ", config.hidden_size, config.vocab_size) self.lm_head = nn.Linear(config.hidden_size, config.vocab_size, bias=False) print("After initializing lm_head: ", config.hidden_size, config.vocab_size) print("weight shape: ", self.lm_head.weight.shape)
The results are:
Before initializing lm_head: 4096 32000 After initializing lm_head: 4096 32000 weight shape: torch.Size([0])
I just very confused why the output of lm_head.weight.shape is 0. I wonder whether you have some insights about this problem.
lm_head.weight.shape
Monitoring this parameter is very important for me. However, I just can not obtain such a parameter during training.
Thanks.
The text was updated successfully, but these errors were encountered:
Are you using the zero-3 strategy to initialize the model? If so, the parameters may be offloaded.
Sorry, something went wrong.
No branches or pull requests
Hello,
Thanks for your wonderful work. I am doing some testing with your code. However, I found a very strange problem.
I want to print the weight shape of lm_head (https://github.com/X-PLUG/mPLUG-Owl/blob/main/mPLUG-Owl2/mplug_owl2/model/modeling_mplug_owl2.py#L220) with the following codes.
The results are:
I just very confused why the output of
lm_head.weight.shape
is 0. I wonder whether you have some insights about this problem.Monitoring this parameter is very important for me. However, I just can not obtain such a parameter during training.
Thanks.
The text was updated successfully, but these errors were encountered: