Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix parameter naming issue with Mistral 7B v0.3 [fixed] #1481

Closed
wants to merge 0 commits into from

Conversation

davmacario
Copy link
Contributor

@rasbt I should have fixed #1480.


See #1431 and #1444.

After some investigation, I noticed two things:

  1. The current download script (litgpg/scripts/download.py), when using safetensors models, does not download the file model.safetensors.index.json (which contains the mapping between the parameter name and the corresponding chunk); for Mistral 7B v0.3, this file contains the "correct" parameter names, i.e., the ones that LitGPT is able to convert without issues.
  2. The issue about parameter naming happens when attempting the conversion from the file consolidated.safetensors (which gets 'converted' to consolidated.bin in download.py); turns out this file contains the model parameters saved with a different name (I suspect it was created from the original Mistral AI implementation), while the files model-0000X-of-00003.safetensors are created from the HF Transformers "MistralForCausalLM" class, so they have the "correct" weight names.

By downloading and using the model.safetensors.index.json file in the same way as the pytorch_model.bin.index.json file when working with '.bin' models from HF, the consolidated.bin file is not used (as the mapping is only necessary for the files model-0000X-of-00003.safetensors) and the model can be downloaded and converted without issues.

I was also considering preventing the download of consolidated.safetensors for what I said above. The huggingface_hub.snapshot_download function used in download.py accepts the arg ignore_patterns, which I would use to ignore this file.

Copy link
Collaborator

@rasbt rasbt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Will test this out locally but looks good!

litgpt/scripts/convert_hf_checkpoint.py Outdated Show resolved Hide resolved
litgpt/scripts/convert_hf_checkpoint.py Outdated Show resolved Hide resolved
@rasbt
Copy link
Collaborator

rasbt commented Jun 12, 2024

It worked fine when trying it on my end. However, when I checkout your branch, I think it was still based on a pretty old LitGPT version. So I had to rebase your commits on top of the current main branch to be able to run it. Will have to force push this ... fingers crossed I didn't mess things up 🤞.

@rasbt
Copy link
Collaborator

rasbt commented Jun 12, 2024

I don't quite understand what happened here ... I thought this should have fixed the CI issues but somehow it didn't ... I'll figure it out...

@rasbt rasbt force-pushed the mistral-v0.3 branch 2 times, most recently from 91a5a5b to 8f65463 Compare June 12, 2024 18:56
@rasbt rasbt closed this Jun 12, 2024
@rasbt rasbt mentioned this pull request Jun 12, 2024
@rasbt
Copy link
Collaborator

rasbt commented Jun 12, 2024

So it seems to be fine now when I tried to apply your changes on top of the recent main branch. Sorry about that, I think there was some issue in your synced fork where it was still on an old state. But I really appreciate your PR and this fix to LitGPT!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants