Fix vocab size padding in Llama3 config #1334

awaelchli · 2024-04-22T16:27:23Z

The huggingface tokenizer returns the padded vocab size. To be consistent in our config, I'm setting the padded vocab size explicitly.

carmocca

Thanks.

My personal HF token is used for the litgpt CI, so since I was just given access to llama 3 now, this started surfacing in CI

awaelchli · 2024-04-22T17:27:16Z

I suspected this. And I missed this because locally I didn't set my token env variable to run tests.

awaelchli added 2 commits April 22, 2024 18:26

vocab size padding

9a0a430

update

4201e31

awaelchli marked this pull request as ready for review April 22, 2024 16:48

awaelchli requested review from carmocca and lantiga as code owners April 22, 2024 16:48

carmocca approved these changes Apr 22, 2024

View reviewed changes

awaelchli merged commit 54628ec into main Apr 22, 2024
9 checks passed

awaelchli deleted the llama3-vocab-size branch April 22, 2024 17:27

Provide feedback