NVIDIA / TensorRT-Model-Optimizer

TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization and sparsity. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed on NVIDIA GPUs.

Home Page:https://nvidia.github.io/TensorRT-Model-Optimizer

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error when quantizing onnx model

de1star opened this issue · comments

Hi,
This error occurred when I tried to quantize my onnx model.

Traceback (most recent call last):
  File "quant.py", line 4, in <module>
    quantize(
  File "/usr/local/lib/python3.8/dist-packages/modelopt/onnx/quantization/quantize.py", line 207, in quantize
    onnx_model = quantize_func(
  File "/usr/local/lib/python3.8/dist-packages/modelopt/onnx/quantization/int8.py", line 186, in quantize
    quantize_static(
  File "/usr/local/lib/python3.8/dist-packages/onnxruntime/quantization/quantize.py", line 513, in quantize_static
    calibrator.collect_data(calibration_data_reader)
  File "/usr/local/lib/python3.8/dist-packages/modelopt/onnx/quantization/ort_patching.py", line 271, in _collect_data_histogram_calibrator
    calibrator.intermediate_outputs.append(calibrator.infer_session.run(None, inputs))
  File "/usr/local/lib/python3.8/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 220, in run
    return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : CopyTensorAsync is not implemented

I have installed onnxruntime-gpu for cuda 12.x by https://onnxruntime.ai/docs/install/.
Could you help with that?

Please share the input ONNX model and the command to reproduce the error.