How to get models with only offline rotation (or models for weight-only quantization)

Question

How to get models with only offline rotation (or models for weight-only quantization)

Tracin opened this issue 3 months ago · comments

Previous chat is here. #22

Let me describe more about this. The only function I call is rotate_model, I skip the layernorm fusion and activation quantization. I just save the model once rotation finished, want to make sure two models are 'same'.
For two reasons you mentioned, I think I have dealt with them.
For reason 1, I remove rotate_ov_proj call from rotate_model here.
For reason 2, I remove online Hadamard for down_proj here
Could you guide me anything else for me to do?

Thanks! @sashkboos

Saleh Ashkboos · Answer 1 · Fri Jun 14 2024 21:43:09 GMT+0800 (China Standard Time)

Thanks @Tracin for your issue

You cannot ignore layernorm_fusion as the whole offline rotation relies on the idea of having RMSNorm without weights (check section 3.4 in the paper).

Tracin · Answer 2 · Mon Jun 17 2024 16:27:12 GMT+0800 (China Standard Time)

@sashkboos Thanks! I just realize that. XQDiag(a) * Q^TW can not make QQ^T canceled but XQ * Q^TDiag(a)W can. That's cool.

telemorne · Answer 3 · Tue Jul 02 2024 00:22:04 GMT+0800 (China Standard Time)

@Tracin
Can you please explain a little more in detail how you altered the script to work with LLama3 for offline weight-only quantization, or if you have a script that would be appreciated?

Tracin · Answer 4 · Fri Jul 05 2024 11:07:34 GMT+0800 (China Standard Time)

@telemorne Sure, first remove all online hadamard operations described in this issue.
In layernorm_fusion, you need to set LN weights to 1.0 (bias to 0.0 if have any) after fused.
Then save_pretrained.

telemorne · Answer 5 · Tue Jul 09 2024 04:57:57 GMT+0800 (China Standard Time)

@Tracin Sorry for the numerous questions, one last thing which class did you use to load the model using the from_pretrained function(is it LlamaForCausalLM?) because I am facing some dimensionality issues?

Tracin · Answer 6 · Tue Jul 09 2024 10:31:30 GMT+0800 (China Standard Time)

@Tracin Sorry for the numerous questions, one last thing which class did you use to load the model using the from_pretrained function(is it LlamaForCausalLM?) because I am facing some dimensionality issues?

Yes, please save model using LlamaForCausalLM from transformers.