Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix parameter naming issue with Mistral 7B v0.3 #1480

Closed
wants to merge 32 commits into from

Conversation

davmacario
Copy link
Contributor

See #1431 and #1444.

After some investigation, I noticed two things:

  1. The current download script (litgpg/scripts/download.py), when using safetensors models, does not download the file model.safetensors.index.json (which contains the mapping between the parameter name and the corresponding chunk); for Mistral 7B v0.3, this file contains the "correct" parameter names, i.e., the ones that LitGPT is able to convert without issues.
  2. The issue about parameter naming happens when attempting the conversion from the file consolidated.safetensors (which gets 'converted' to consolidated.bin in download.py); turns out this file contains the model parameters saved with a different name (I suspect it was created from the original Mistral AI implementation), while the files model-0000X-of-00003.safetensors are created from the HF Transformers "MistralForCausalLM" class, so they have the "correct" weight names.

By downloading and using the model.safetensors.index.json file in the same way as the pytorch_model.bin.index.json file when working with '.bin' models from HF, the consolidated.bin file is not used (as the mapping is only necessary for the files model-0000X-of-00003.safetensors) and the model can be downloaded and converted without issues.

I was also considering preventing the download of consolidated.safetensors for what I said above.
The huggingface_hub.snapshot_download function used in download.py accepts the arg ignore_patterns, which I would use to ignore this file.

rasbt and others added 30 commits May 23, 2024 18:29
Co-authored-by: Carlos Mocholí <[email protected]>
Co-authored-by: Adrian Wälchli <[email protected]>
@rasbt
Copy link
Collaborator

rasbt commented Jun 12, 2024

Thanks for the PR! When I understand correctly, the changes you are suggesting would only be in litgpt/scripts/convert_hf_checkpoint.py? Unfortunately, it looks like there were some accidents in your rebasing, so it looks like lots of duplicated changes got committed: There are currently more than 100 files changed. Would you mind rebasing again off of the recent main branch or maybe doing a fresh PR with just the files to be changed, e.g., litgpt/scripts/convert_hf_checkpoint.py? Let me know if you need help with that.

@davmacario
Copy link
Contributor Author

Yep, sorry, I just realized this :) I was working on an older fork and made some mistake in syncing it to this specific branch (mistral-v0.3).
Will do ASAP!

@davmacario davmacario closed this Jun 12, 2024
@davmacario davmacario deleted the mistral-v0.3 branch June 12, 2024 14:23
@rasbt
Copy link
Collaborator

rasbt commented Jun 12, 2024

Thanks! Btw GitHub now has a useful "sync fork" feature that you can use in your forked repo
Screenshot 2024-06-12 at 9 28 29 AM

@davmacario
Copy link
Contributor Author

davmacario commented Jun 12, 2024

Yep! I believe it was the culprit of the extra commits, though, as it default synced my fork's branch to the main branch without asking :/
I synced to this branch using remotes, and there were no issues.

I'm opening a new PR now!

@Andrei-Aksionov
Copy link
Collaborator

I prefer to sync my fork via GitHub actions CLI tool (gh) whenever I want to create a new branch.
Speeds up a process a little bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants