How to quantize LLama in fine-tuning ? #470

sfarzi · 2023-11-13T09:43:39Z

I wanna fine-tune ( Lora and Prefix tuning) LLama 70B over 4 GPU A40.
My plan is using quantized version of LLAMA in fine-tuning phase. But I have not found any implementation for this purpose in the source codes provided by Lit-llama. As I know, using quantization is only implemented in inference, in the generate method, and there is no implementation for fine-tuning. So My clear question is how to implement quantized fine-tuning ? is there any sample?
Tnx in advance.
Saeed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to quantize LLama in fine-tuning ? #470

How to quantize LLama in fine-tuning ? #470

sfarzi commented Nov 13, 2023

How to quantize LLama in fine-tuning ? #470

How to quantize LLama in fine-tuning ? #470

Comments

sfarzi commented Nov 13, 2023