TimDettmers / bitsandbytes

Accessible large language models via k-bit quantization for PyTorch.

Home Page:https://huggingface.co/docs/bitsandbytes/main/en/index

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Dequantizing int8 models to fp16

raunaks13 opened this issue · comments

I have loaded an LLM in huggingface with load_in_8bit=True.
I noticed the objects in the state_dict are structured something like

  1. model.layers.18.self_attn.k_proj.weight
  2. model.layers.18.self_attn.k_proj.SCB
  3. model.layers.18.self_attn.k_proj.weight_format

The SCB and weight_format are present only in the quantized model. I think SCB refers to scale and bias that can help us in recreating the original tensor? weight_format is a string that says “row”. The huggingface integration guide mentions a .CB field in addition to the .SCB field, but I could not find it in the state_dict. Not sure if the codebase has changed since that was written?

Anyway, I am not sure about the exact method to dequantize the tensor to get back the original, but I tried the following:
(weight_SCB.unsqueeze(1) * weight)/127
This is giving a tensor that is close to the original model (what I get without adding the parameter load_in_8bit=True), but not the same.
I am not sure whether I am following the correct approach for dequantization. Would be great if someone could point me to some code or documentation on how I can recreate the exact original tensor from the weights.

As a follow up question, I know that for some models there are outlier values that are not quantized even though other values in the tensor are quantized. However I could not find this information in the state_dict. How can we find and handle these values during the dequantization process?

Hello, I came across the issue you posted about tensor dequantization and was wondering if there has been any progress since it was initially raised. Specifically, are you still using the method (weight_SCB.unsqueeze(1) * weight)/127 for dequantizing the tensor? If so, has it proved to be effective, or have you encountered any problems? I am facing a similar challenge and any insights you could share would be greatly appreciated.

I think dequantizing the tensor to recover the exact original is not possible, since the quantization process involves rounding. When dequantization is used, however, I don't think the error matters that much based on the results I have seen in papers.
I haven't tested the impact of using the above formula on training/inference myself.

Thanks for the quick reply! I totally agree that perfect dequantization isn’t possible due to rounding during quantization. I’m planning to test the method (weight_SCB.unsqueeze(1) * weight)/127 you mentioned to see how it goes in practice. Appreciate your insights!