Quantization not working for models with BPE tokenizer #102

sanchitvj · 2024-04-25T20:12:41Z

Meta Llama-3 cannot be quantized because it uses BPE tokenizer.

sanchitvj · 2024-04-25T20:55:02Z

subprocess.CalledProcessError: Command '['python3', 'convert.py', '/home/ubuntu/volume_2k/Capstone_5/src/grag/quantize/test_data/llama.cpp/models/Meta-Llama-3-8B-Instruct/']' returned non-zero exit status 1.
Encountered the above error while quantizing Llama-3-8B because it uses a BPE tokenizer. For quantizing that you have to use below command line:
python convert.py models/mymodel/ --vocab-type bpe
more here

Add support for such models.
Better error handling for models with different quantization mechanisms.
Validate if LlamaCpp supports quantization for given model.

sanchitvj · 2024-05-01T23:09:14Z

This issue is solved by the #118 PR. Now LLMs both with and without BPE tokenizer are supported.

sanchitvj changed the title ~~Quantization not working for models with BPW tokenizer~~ Quantization not working for models with BPE tokenizer Apr 25, 2024

arjbingly linked a pull request May 9, 2024 that will close this issue

Quantization updated to support more OS and architectures #120

Merged

arjbingly closed this as completed in #120 May 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantization not working for models with BPE tokenizer #102

Quantization not working for models with BPE tokenizer #102

sanchitvj commented Apr 25, 2024

sanchitvj commented Apr 25, 2024

sanchitvj commented May 1, 2024

Quantization not working for models with BPE tokenizer #102

Quantization not working for models with BPE tokenizer #102

Comments

sanchitvj commented Apr 25, 2024

sanchitvj commented Apr 25, 2024

sanchitvj commented May 1, 2024