mit-han-lab / smoothquant

[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

Home Page:https://arxiv.org/abs/2211.10438

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Doesn't work on gpt models.

YaphetS-X opened this issue · comments

Hi, GuangXuan, Thanks for your amazing work!
I'm now working on gpt model quantization, unlike opt models based on nn.linear, the gpt models are based on Cov1D which is exactly the same with nn.linear by a transpose operation. Another difference between gpt model and opt model is the positional embedding, which gpt directly use the nn.embedding class and opt use the OPTLearnedPositionalEmbedding class.
With the conclution above, I wrote a gpt2opt converter, and modified the opt class forward method, and finally got the converted model a exactly same accuracy with original gpt model. However, when I tried to generate the act scales, export the int8 model, and then run the evaluate code with the converted opt model, it occurs a accuracy=0.0 problem. Also the gpt-converted-opt model act scales are much larger than the scales generated by the original opt models.
Also, when I run the test_opt_decoder.py script provide by the torch-int repo, the original opt-125m got an result of 0.0359 and the converted opt model is 0.2607.
I've reviewd the code for many times but really can't work this out. Hope you can give me some advices if you have any idea about this. If you need more information, I can provide the converted model and the modified OPTForCausalLM code. Thanks!

Found that was caused by the activation layer after the fc1 layer, opt use relu but gpt use the gelu

commented

@YaphetS-X Hi, have you solve the collapse caused by GELU?