You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I wanna fine-tune ( Lora and Prefix tuning) LLama 70B over 4 GPU A40.
My plan is using quantized version of LLAMA in fine-tuning phase. But I have not found any implementation for this purpose in the source codes provided by Lit-llama. As I know, using quantization is only implemented in inference, in the generate method, and there is no implementation for fine-tuning. So My clear question is how to implement quantized fine-tuning ? is there any sample?
Tnx in advance.
Saeed
The text was updated successfully, but these errors were encountered:
I wanna fine-tune ( Lora and Prefix tuning) LLama 70B over 4 GPU A40.
My plan is using quantized version of LLAMA in fine-tuning phase. But I have not found any implementation for this purpose in the source codes provided by Lit-llama. As I know, using quantization is only implemented in inference, in the generate method, and there is no implementation for fine-tuning. So My clear question is how to implement quantized fine-tuning ? is there any sample?
Tnx in advance.
Saeed
The text was updated successfully, but these errors were encountered: