Does permutation order have to be included when saving the quantized model?

Question

Does permutation order have to be included when saving the quantized model?

luccareinehr opened this issue a year ago · comments

I understand model saving is yet to be implemented, but it looks like permutation may increase the memory footprint of the model.

If we save an SpQR-quantized model in a file and try to dequantize it, we'll end up with a permuted version of the weight matrices (in floating points). So, to use it in inference, it would need to be de-permuted.

Is there any other way of doing inference in SpQR without having to save the permutation order?

Egiazarian Vage · Answer 1 · Wed Aug 09 2023 18:11:55 GMT+0800 (China Standard Time)

Hello, I'm sorry, for late response.

I understand model saving is yet to be implemented

Yes you are correct, here is a draft PR of model saving #32, it is almost complete,but not tested yet.

permutation may increase the memory footprint

Storing permutation will increase memory footprint by a negligible amount, less than 0.01 bits per parameter.

If we save an SpQR-quantized model in a file and try to dequantize it, we'll end up with a permuted version of the weight matrices (in floating points). So, to use it in inference, it would need to be de-permuted.

Yes, or you can de-permute activations.

Is there any other way of doing inference in SpQR without having to save the permutation order?

Unfortunately, we are not aware of it. As a workaround, you can skip permuting the activations, via identity option. Usually, the difference between act_order and identity is not large See Table 3 in SpQR paper.

luccareinehr · Answer 2 · Wed Aug 09 2023 19:55:17 GMT+0800 (China Standard Time)

No worries about the time. That's awesome, thanks! :)