Runtime stuck on execution of inference of YOLOv5 compiled from ONNX
maximiliankir opened this issue · comments
What happened?
When running the inference of a YOLOv5 model, which has been imported from ONNX and compiled for the CUDA backend, the execution does not terminate. It just keeps on running on the first input image.
CPU usage is ~0%, so I suspect the execution is just stuck somewhere.
Steps to reproduce your issue
Imported with iree-import-onnx yolov5s.onnx -o yolov5_onnx.mlir
Compiled with iree-compile --iree-hal-target-backends=cuda --iree-hal-cuda-llvm-target-arch=sm_87 yolov5_onnx.mlir -o yolov5_onnx_cuda.vmfb
Use the runtime in python with:
...
# Load flatbuffer of yolo model from file
with open(iree_fb_path, "rb") as f:
flatbuffer = f.read()
gpu_device = ireert.get_device("cuda")
config = ireert.Config(device=gpu_device)
# TODO Gives warning about unsafe copy of unaligned VmModule buffer
yolo_module = ireert.VmModule.from_flatbuffer(config.vm_instance, flatbuffer)
modules = config.default_vm_modules + (yolo_module,)
context = ireert.SystemContext(vm_modules=modules, config=config)
invoker = context.modules.module["torch_jit"]
batch = ireert.asdevicearray(gpu_device, preprocessed_img)
result = invoker(batch)
...
What component(s) does this issue relate to?
No response
Version information
candidate-20240605.915
Additional context
It works when the model was compiled using the tf-importer and a Tensorflow SavedModel. The TF model uses FP32, the ONNX model FP16. Thats one of the reasons, why I want to import from the ONNX model.
I attached a snippet of the MLIR file produced by the ONNX importer (without weights).
I need some advice on how to debug this further. How can I find the part where the execution gets stuck?
It seems to be the combination of ONNX and CUDA. ONNX frontend with CPU works. TF frontend and CUDA works too.
Could be related to #17376. @ScottTodd Can you tell me, what you learned there?
Sounds like #16666 is showing up outside of the individual op tests. Not sure, needs further debugging through the stack.
I ran the model with --trace_execution. It seems like it gets stuck on dealloca of the cuda Device.
The trace is attached.
onnx_cuda_trace.log