Fix parameter naming issue with Mistral 7B v0.3 [fixed] #1481
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
@rasbt I should have fixed #1480.
See #1431 and #1444.
After some investigation, I noticed two things:
model.safetensors.index.json
(which contains the mapping between the parameter name and the corresponding chunk); for Mistral 7B v0.3, this file contains the "correct" parameter names, i.e., the ones that LitGPT is able to convert without issues.consolidated.safetensors
(which gets 'converted' toconsolidated.bin
in download.py); turns out this file contains the model parameters saved with a different name (I suspect it was created from the original Mistral AI implementation), while the filesmodel-0000X-of-00003.safetensors
are created from the HF Transformers "MistralForCausalLM" class, so they have the "correct" weight names.By downloading and using the
model.safetensors.index.json
file in the same way as thepytorch_model.bin.index.json
file when working with '.bin' models from HF, theconsolidated.bin
file is not used (as the mapping is only necessary for the filesmodel-0000X-of-00003.safetensors
) and the model can be downloaded and converted without issues.I was also considering preventing the download of
consolidated.safetensors
for what I said above. Thehuggingface_hub.snapshot_download
function used in download.py accepts the argignore_patterns
, which I would use to ignore this file.