Questions related to Compile the QuaRot on CPU and Model Saving
HuangOwen opened this issue · comments
Thanks for your awesome work! I have a few questions:
- Is it possible to compile the QuaRot without cuda? I know that the fast Hadamard kernel requires GPU and it helps to keep the efficiency. But I can remove all the online hadamard operations and only perform weight modification. Is it possible to compile a CPU version?
- After rotating a huggingface model, for example, llama2-7b-hf, (By calling the
rotation_utils.fuse_layer_norms(model)
androtation_utils.rotate_model(model, args)
). I want to save the rotated model without quantization usingmodel.save_pretrained(save_path)
. However, when I load it, theinput_layernorm.weight
has a different shape and I cannot load or use the model anymore. I understand that this is because RMSNorm in the original llama and the rotated llama are different. Is there a solution to save and load the rotated model using huggingface?
Looking forward to your replies!
Thank you so much for your issue.
Is it possible to compile the QuaRot without cuda? I know that the fast Hadamard kernel requires GPU and it helps to keep the efficiency. But I can remove all the online hadamard operations and only perform weight modification. Is it possible to compile a CPU version?
Yes. If you remove all online hadamards, then you will not use fast-hadamard-transform repo anymore. In addition, you can use this function to apply Hadamard (but this will be slow) which should be ok on CPU.
After rotating a huggingface model, for example, llama2-7b-hf, (By calling the rotation_utils.fuse_layer_norms(model) androtation_utils.rotate_model(model, args)). I want to save the rotated model without quantization using model.save_pretrained(save_path). However, when I load it, the input_layernorm.weight has a different shape and I cannot load or use the model anymore. I understand that this is because RMSNorm in the original llama and the rotated llama are different. Is there a solution to save and load the rotated model using huggingface?
I think you can solve this issue by first rotating the model and then loading the checkpoint.