Error when quantizing onnx model
de1star opened this issue · comments
Hi,
This error occurred when I tried to quantize my onnx model.
Traceback (most recent call last):
File "quant.py", line 4, in <module>
quantize(
File "/usr/local/lib/python3.8/dist-packages/modelopt/onnx/quantization/quantize.py", line 207, in quantize
onnx_model = quantize_func(
File "/usr/local/lib/python3.8/dist-packages/modelopt/onnx/quantization/int8.py", line 186, in quantize
quantize_static(
File "/usr/local/lib/python3.8/dist-packages/onnxruntime/quantization/quantize.py", line 513, in quantize_static
calibrator.collect_data(calibration_data_reader)
File "/usr/local/lib/python3.8/dist-packages/modelopt/onnx/quantization/ort_patching.py", line 271, in _collect_data_histogram_calibrator
calibrator.intermediate_outputs.append(calibrator.infer_session.run(None, inputs))
File "/usr/local/lib/python3.8/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 220, in run
return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : CopyTensorAsync is not implemented
I have installed onnxruntime-gpu for cuda 12.x by https://onnxruntime.ai/docs/install/.
Could you help with that?
Please share the input ONNX model and the command to reproduce the error.