Fix parameter naming issue with Mistral 7B v0.3 [fixed] #1481

davmacario · 2024-06-12T14:37:15Z

@rasbt I should have fixed #1480.

After some investigation, I noticed two things:

The current download script (litgpg/scripts/download.py), when using safetensors models, does not download the file model.safetensors.index.json (which contains the mapping between the parameter name and the corresponding chunk); for Mistral 7B v0.3, this file contains the "correct" parameter names, i.e., the ones that LitGPT is able to convert without issues.
The issue about parameter naming happens when attempting the conversion from the file consolidated.safetensors (which gets 'converted' to consolidated.bin in download.py); turns out this file contains the model parameters saved with a different name (I suspect it was created from the original Mistral AI implementation), while the files model-0000X-of-00003.safetensors are created from the HF Transformers "MistralForCausalLM" class, so they have the "correct" weight names.

By downloading and using the model.safetensors.index.json file in the same way as the pytorch_model.bin.index.json file when working with '.bin' models from HF, the consolidated.bin file is not used (as the mapping is only necessary for the files model-0000X-of-00003.safetensors) and the model can be downloaded and converted without issues.

I was also considering preventing the download of consolidated.safetensors for what I said above. The huggingface_hub.snapshot_download function used in download.py accepts the arg ignore_patterns, which I would use to ignore this file.

rasbt

Thanks for the PR! Will test this out locally but looks good!

litgpt/scripts/convert_hf_checkpoint.py

rasbt · 2024-06-12T18:44:55Z

It worked fine when trying it on my end. However, when I checkout your branch, I think it was still based on a pretty old LitGPT version. So I had to rebase your commits on top of the current main branch to be able to run it. Will have to force push this ... fingers crossed I didn't mess things up 🤞.

rasbt · 2024-06-12T18:47:29Z

I don't quite understand what happened here ... I thought this should have fixed the CI issues but somehow it didn't ... I'll figure it out...

rasbt · 2024-06-12T22:13:05Z

So it seems to be fine now when I tried to apply your changes on top of the recent main branch. Sorry about that, I think there was some issue in your synced fork where it was still on an old state. But I really appreciate your PR and this fix to LitGPT!

davmacario requested review from awaelchli and lantiga as code owners June 12, 2024 14:37

rasbt approved these changes Jun 12, 2024

View reviewed changes

litgpt/scripts/convert_hf_checkpoint.py Outdated Show resolved Hide resolved

litgpt/scripts/convert_hf_checkpoint.py Outdated Show resolved Hide resolved

rasbt force-pushed the mistral-v0.3 branch from 2eadfb8 to 9383b5c Compare June 12, 2024 18:45

rasbt force-pushed the mistral-v0.3 branch 2 times, most recently from 91a5a5b to 8f65463 Compare June 12, 2024 18:56

rasbt closed this Jun 12, 2024

rasbt force-pushed the mistral-v0.3 branch from be48e94 to 8f65463 Compare June 12, 2024 19:01

rasbt mentioned this pull request Jun 12, 2024

Recreate Mistral PR #1483

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix parameter naming issue with Mistral 7B v0.3 [fixed] #1481

Fix parameter naming issue with Mistral 7B v0.3 [fixed] #1481

davmacario commented Jun 12, 2024

rasbt left a comment

rasbt commented Jun 12, 2024

rasbt commented Jun 12, 2024

rasbt commented Jun 12, 2024

Fix parameter naming issue with Mistral 7B v0.3 [fixed] #1481

Fix parameter naming issue with Mistral 7B v0.3 [fixed] #1481

Conversation

davmacario commented Jun 12, 2024

rasbt left a comment

Choose a reason for hiding this comment

rasbt commented Jun 12, 2024

rasbt commented Jun 12, 2024

rasbt commented Jun 12, 2024