tokenizer.model #186

hasakikiki · 2024-06-27T12:55:22Z

I fine-tuned an llm based on the llama skeleton and used convert_hf_checkpoint and quantize to complete the quantification. However, when generating, the tokenizer.model file is missing. How can I operate or generate it?

yanboliang · 2024-09-16T04:02:13Z

Which model are you going to inference? If it's a Llama 3+ model, we already copied it to the right place in convert_hf_checkpoint.

gpt-fast/scripts/convert_hf_checkpoint.py

Lines 118 to 126 in c9f683e

    
           if 'llama-3' in model_name.lower(): 
        
               if 'llama-3.1-405b' in model_name.lower(): 
        
                   original_dir = checkpoint_dir / "original" / "mp16" 
        
               else: 
        
                   original_dir = checkpoint_dir / "original" 
        
               tokenizer_model = original_dir / "tokenizer.model" 
        
               tokenizer_model_tiktoken = checkpoint_dir / "tokenizer.model" 
        
               print(f"Copying {tokenizer_model} to {tokenizer_model_tiktoken}") 
        
               shutil.copy(tokenizer_model, tokenizer_model_tiktoken)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tokenizer.model #186

tokenizer.model #186

hasakikiki commented Jun 27, 2024

yanboliang commented Sep 16, 2024 •

edited

Loading

tokenizer.model #186

tokenizer.model #186

Comments

hasakikiki commented Jun 27, 2024

yanboliang commented Sep 16, 2024 • edited Loading

yanboliang commented Sep 16, 2024 •

edited

Loading