Qualcomm-AI-research / transformer-quantization

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

A question on the nonlinear layers

Kevinpsk opened this issue · comments

Hi there,

Thanks a lot for releasing the code.
I have a question regarding the nonlinear layers such as GELU, softmax or even LayerNorm (as it has RSQRT). If I understand your code correctly, you are using the floating-point versions of their implementations in the QAT model. Does this mean that we are not actually simulating the quantized behaviours of these layers in the QAT model accurately? Maybe these layers are implemented as look-up tables or they have full int implementations on hardware devices, and not simulating these in QAT has minimal impact on quantized model performance? Can you clarify on this a bit more?

Thanks a lot.

I am also wondering about this. Can anyone from the authors/developers clarify?

Yes, it looks like QDQ is used here:

attention_probs = nn.Softmax(dim=-1)(attention_scores)

class QuantLayerNorm(QuantizationHijacker, nn.LayerNorm):

return quantize_model(nn.Sequential(m_dense, m_act), **quant_params)