Lightning-AI / lit-llama

Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to quantize LLama in fine-tuning ?

sfarzi opened this issue · comments

I wanna fine-tune ( Lora and Prefix tuning) LLama 70B over 4 GPU A40.
My plan is using quantized version of LLAMA in fine-tuning phase. But I have not found any implementation for this purpose in the source codes provided by Lit-llama. As I know, using quantization is only implemented in inference, in the generate method, and there is no implementation for fine-tuning. So My clear question is how to implement quantized fine-tuning ? is there any sample?
Tnx in advance.
Saeed