LoRa adds a tiny amount of trainable parameters, i.e., adapters, for each layer of the LLM and freezes all the original parameters. For fine-tuning, we only have to update the adapter weights which significantly reduces the memory footprint.
Quantized LLMs with Low-Rank Adapters, Fine Tuning GPT-NeoX-12B
LoRa adds a tiny amount of trainable parameters, i.e., adapters, for each layer of the LLM and freezes all the original parameters. For fine-tuning, we only have to update the adapter weights which significantly reduces the memory footprint.
Quantized LLMs with Low-Rank Adapters, Fine Tuning GPT-NeoX-12B