Fix parameter naming issue with Mistral 7B v0.3 #1480

davmacario · 2024-06-12T06:11:49Z

After some investigation, I noticed two things:

The current download script (litgpg/scripts/download.py), when using safetensors models, does not download the file model.safetensors.index.json (which contains the mapping between the parameter name and the corresponding chunk); for Mistral 7B v0.3, this file contains the "correct" parameter names, i.e., the ones that LitGPT is able to convert without issues.
The issue about parameter naming happens when attempting the conversion from the file consolidated.safetensors (which gets 'converted' to consolidated.bin in download.py); turns out this file contains the model parameters saved with a different name (I suspect it was created from the original Mistral AI implementation), while the files model-0000X-of-00003.safetensors are created from the HF Transformers "MistralForCausalLM" class, so they have the "correct" weight names.

By downloading and using the model.safetensors.index.json file in the same way as the pytorch_model.bin.index.json file when working with '.bin' models from HF, the consolidated.bin file is not used (as the mapping is only necessary for the files model-0000X-of-00003.safetensors) and the model can be downloaded and converted without issues.

I was also considering preventing the download of consolidated.safetensors for what I said above.
The huggingface_hub.snapshot_download function used in download.py accepts the arg ignore_patterns, which I would use to ignore this file.

Co-authored-by: Carlos Mocholí <[email protected]>

Co-authored-by: Adrian Wälchli <[email protected]>

Co-authored-by: awaelchli <[email protected]>

Co-authored-by: rasbt <[email protected]>

Co-authored-by: Sebastian Raschka <[email protected]>

Co-authored-by: awaelchli <[email protected]>

rasbt · 2024-06-12T14:16:13Z

Thanks for the PR! When I understand correctly, the changes you are suggesting would only be in litgpt/scripts/convert_hf_checkpoint.py? Unfortunately, it looks like there were some accidents in your rebasing, so it looks like lots of duplicated changes got committed: There are currently more than 100 files changed. Would you mind rebasing again off of the recent main branch or maybe doing a fresh PR with just the files to be changed, e.g., litgpt/scripts/convert_hf_checkpoint.py? Let me know if you need help with that.

davmacario · 2024-06-12T14:18:23Z

Yep, sorry, I just realized this :) I was working on an older fork and made some mistake in syncing it to this specific branch (mistral-v0.3).
Will do ASAP!

rasbt · 2024-06-12T14:29:26Z

Thanks! Btw GitHub now has a useful "sync fork" feature that you can use in your forked repo

davmacario · 2024-06-12T14:32:30Z

Yep! I believe it was the culprit of the extra commits, though, as it default synced my fork's branch to the main branch without asking :/
I synced to this branch using remotes, and there were no issues.

I'm opening a new PR now!

Andrei-Aksionov · 2024-06-12T14:51:44Z

I prefer to sync my fork via GitHub actions CLI tool (gh) whenever I want to create a new branch.
Speeds up a process a little bit.

rasbt and others added 30 commits May 23, 2024 18:29

OptimizerArgs (Lightning-AI#1409)

141c8bf

Co-authored-by: Carlos Mocholí <[email protected]>

Pin back to main

2174054

Fix optimizer init with fused=True (Lightning-AI#1434)

daffef0

Fix learning rate calculation in pretrain (Lightning-AI#1435)

66a797a

Align readme (Lightning-AI#1438)

dbf7542

Pin litdata (Lightning-AI#1440)

1754a2b

Fix README.md alignment (Lightning-AI#1439)

19a0d7a

Update README.md for one last time (Lightning-AI#1442)

221b7ef

A more centered look (Lightning-AI#1449)

f6654e8

New CLI (Lightning-AI#1437)

3fa17fb

Co-authored-by: Adrian Wälchli <[email protected]>

Update error message (Lightning-AI#1453)

916775c

Explain how to list all available models (Lightning-AI#1455)

339cf43

Detect tensor cores (Lightning-AI#1456)

798d725

Check checkpoint_dir and add checkpoints to path (Lightning-AI#1454)

e567dbe

Co-authored-by: awaelchli <[email protected]>

Add MicroLlama training support (Lightning-AI#1457)

fa88952

Co-authored-by: rasbt <[email protected]>

Streaming for serving with chat's generate function (Lightning-AI#1426)

0f3bca7

Fix sequence length bug (Lightning-AI#1462)

8c7df82

Add lr_warmup_steps, max_steps values validation (Lightning-AI#1460)

3e4fb84

Co-authored-by: Sebastian Raschka <[email protected]>

Fix issue where path in merge_lora is overwritten (Lightning-AI#1465)

fe443ba

Option to skip expensive final validation (Lightning-AI#1372)

9538d6a

Allow batch size "auto" setting in evaluate (Lightning-AI#1469)

d657908

Warn users when there is a bnb mismatch (Lightning-AI#1468)

7be2851

Allow batch argument with batch recomputation (Lightning-AI#1470)

67e9164

LitGPT Python API draft (Lightning-AI#1459)

0bb34ab

Co-authored-by: awaelchli <[email protected]>

Bump version for PyPI release (Lightning-AI#1476)

8ca46d2

Update download_model_weights.md

d2ba385

bumb version to 0.4.1.dev0

3594142

Fix typos in Download Model Weights documentation (Lightning-AI#1477)

ee9108f

Merge remote-tracking branch 'upstream/main' into mistral-v0.3

97ef696

fix: download safetensors weight mapping

bbe4cf4

fix: use safetensors weight mapping

4f5d7fd

davmacario requested review from williamFalcon, lantiga and awaelchli as code owners June 12, 2024 06:11

fix: correct extension update in file name

42670fe

davmacario closed this Jun 12, 2024

davmacario deleted the mistral-v0.3 branch June 12, 2024 14:23

davmacario mentioned this pull request Jun 12, 2024

Fix parameter naming issue with Mistral 7B v0.3 [fixed] #1481

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix parameter naming issue with Mistral 7B v0.3 #1480

Fix parameter naming issue with Mistral 7B v0.3 #1480

davmacario commented Jun 12, 2024

rasbt commented Jun 12, 2024

davmacario commented Jun 12, 2024

rasbt commented Jun 12, 2024

davmacario commented Jun 12, 2024 •

edited

Loading

Andrei-Aksionov commented Jun 12, 2024

Fix parameter naming issue with Mistral 7B v0.3 #1480

Fix parameter naming issue with Mistral 7B v0.3 #1480

Conversation

davmacario commented Jun 12, 2024

rasbt commented Jun 12, 2024

davmacario commented Jun 12, 2024

rasbt commented Jun 12, 2024

davmacario commented Jun 12, 2024 • edited Loading

Andrei-Aksionov commented Jun 12, 2024

davmacario commented Jun 12, 2024 •

edited

Loading