Can the QAT-quantized model calculate inference speed?

Question

Can the QAT-quantized model calculate inference speed?

Annmixiu opened this issue a year ago · comments

前辈您好，首先感谢您提供的剪枝(Resrep)和性能补偿(Acnet)方法，目前我已成功实践对Transformer和Conformer的剪枝，这是对未测试模型应用的补充，想请教下您，针对QAT量化后的TensorRT格式(.trt)的模型还可以测试其算力吗(例如推理速度或吞吐量)？我尝试了几种方法但都无法成功，请问您之前做过对量化后模型的算力计算吗?

Dahan Gong · Answer 1 · Tue May 30 2023 23:34:03 GMT+0800 (China Standard Time)

TensorRT 基本上是个黑盒子；Nvidia 有自己的 profiler 工具来衡量各层速度，不过吞吐量我不确定要怎么算。

python: https://docs.nvidia.com/deeplearning/tensorrt/api/python_api/infer/Core/Profiler.html
C++: https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/classnvinfer1_1_1_i_profiler.html#details