NVIDIA / TensorRT-Model-Optimizer

Hi, I wonder is it possible to choose different alpha for mtq.INT8_SMOOTHQUANT_CFG?

I found an example here and it works!

TensorRT-Model-Optimizer/llm_ptq/hf_ptq.py

Line 150 in 06a1553

    
           quant_cfg["algorithm"] = {"method": "smoothquant", "alpha": 0.5}  # type: ignore[index]

But I noticed that setting alpha != 1 in SmoothQuant leads to different scales for qkv and some linear layers, which seems to prevent fusion with the previous norm layer. Shouldn't these layers have the same smooth scale for proper fusion?

Is this a bug or am I misunderstanding something?

Thanks!

with alpha!=1, qkv will have different pre-quant scaling factors and we do a postprocess to resmooth it, so not a bug
This also happens to AWQ.

Thanks! Clears things up on rescaling for alpha!=1. Does modelopt handle the rescaling internally? Ideally, I'd love to see an example of how to grab those resmoothed rescaling factors. @RalphMao

@siahuat0727 modelopt handles the rescaling internally during tensorrtllm checkpoint export .

There are no public examples which showcase this.

How to choose different alpha for mtq.INT8_SMOOTHQUANT_CFG?