How to quantize LLama in fine-tuning ?

Question

How to quantize LLama in fine-tuning ?

sfarzi opened this issue 9 months ago · comments

I wanna fine-tune ( Lora and Prefix tuning) LLama 70B over 4 GPU A40.
My plan is using quantized version of LLAMA in fine-tuning phase. But I have not found any implementation for this purpose in the source codes provided by Lit-llama. As I know, using quantization is only implemented in inference, in the generate method, and there is no implementation for fine-tuning. So My clear question is how to implement quantized fine-tuning ? is there any sample?
Tnx in advance.
Saeed