Why are qlora (4bit) and lora (16bit) adapter file sizes the same?

Question

Why are qlora (4bit) and lora (16bit) adapter file sizes the same?

codybum opened this issue 2 months ago · comments

V. K. Cody Bumgardner commented 2 months ago

This is not a LoraX issue, but this community could have some insight into this question.

When I train a qlora adapter (4bit) there are clearly less resources being used and the adapter trains much faster. However, the saved size of the adapter is no smaller than a similarly trained lora (16bit) adapter. For small models this is not an issue, but for larger models and higher ranks, the size starts to become an issue.

There are numerous quantization methods and formats for full models, but I can't find much information on saving an adapter in a 4bit format vs 16bit when it has been trained in 4bit. There is mention of loading/saving 4bit formats here (bitsandbytes-foundation/bitsandbytes#753), but I don't know the current state.

Any thoughts?

V. K. Cody Bumgardner · Answer 1 · Fri Jun 14 2024 04:56:12 GMT+0800 (China Standard Time)

In a different thread this was the response: "The quantization is only applied to the pre-trained weights, and the trainable adapter weights remain as float32 precision. Thus whatever the quantization setting you have chosen, the adapter weights always have the same size."

Seems like there should be a way to serialize adapter weights like we do pre-trained models.