-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix parameter naming issue with Mistral 7B v0.3 #1480
Conversation
Co-authored-by: Carlos Mocholí <[email protected]>
Co-authored-by: Adrian Wälchli <[email protected]>
Co-authored-by: awaelchli <[email protected]>
Co-authored-by: rasbt <[email protected]>
Co-authored-by: Sebastian Raschka <[email protected]>
Co-authored-by: awaelchli <[email protected]>
Thanks for the PR! When I understand correctly, the changes you are suggesting would only be in |
Yep, sorry, I just realized this :) I was working on an older fork and made some mistake in syncing it to this specific branch (mistral-v0.3). |
Yep! I believe it was the culprit of the extra commits, though, as it default synced my fork's branch to the main branch without asking :/ I'm opening a new PR now! |
I prefer to sync my fork via GitHub actions CLI tool (gh) whenever I want to create a new branch. |
See #1431 and #1444.
After some investigation, I noticed two things:
model.safetensors.index.json
(which contains the mapping between the parameter name and the corresponding chunk); for Mistral 7B v0.3, this file contains the "correct" parameter names, i.e., the ones that LitGPT is able to convert without issues.consolidated.safetensors
(which gets 'converted' toconsolidated.bin
in download.py); turns out this file contains the model parameters saved with a different name (I suspect it was created from the original Mistral AI implementation), while the filesmodel-0000X-of-00003.safetensors
are created from the HF Transformers "MistralForCausalLM" class, so they have the "correct" weight names.By downloading and using the
model.safetensors.index.json
file in the same way as thepytorch_model.bin.index.json
file when working with '.bin' models from HF, theconsolidated.bin
file is not used (as the mapping is only necessary for the filesmodel-0000X-of-00003.safetensors
) and the model can be downloaded and converted without issues.I was also considering preventing the download of
consolidated.safetensors
for what I said above.The
huggingface_hub.snapshot_download
function used in download.py accepts the argignore_patterns
, which I would use to ignore this file.