iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.

Home Page:http://iree.dev/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Runtime stuck on execution of inference of YOLOv5 compiled from ONNX

maximiliankir opened this issue · comments

What happened?

When running the inference of a YOLOv5 model, which has been imported from ONNX and compiled for the CUDA backend, the execution does not terminate. It just keeps on running on the first input image.
CPU usage is ~0%, so I suspect the execution is just stuck somewhere.

Steps to reproduce your issue

Imported with iree-import-onnx yolov5s.onnx -o yolov5_onnx.mlir
Compiled with iree-compile --iree-hal-target-backends=cuda --iree-hal-cuda-llvm-target-arch=sm_87 yolov5_onnx.mlir -o yolov5_onnx_cuda.vmfb

Use the runtime in python with:

...
# Load flatbuffer of yolo model from file
with open(iree_fb_path, "rb") as f:
    flatbuffer = f.read()

gpu_device = ireert.get_device("cuda")
config = ireert.Config(device=gpu_device)
# TODO Gives warning about unsafe copy of unaligned VmModule buffer
yolo_module = ireert.VmModule.from_flatbuffer(config.vm_instance, flatbuffer)
modules = config.default_vm_modules + (yolo_module,)

context = ireert.SystemContext(vm_modules=modules, config=config)

invoker = context.modules.module["torch_jit"]

batch = ireert.asdevicearray(gpu_device, preprocessed_img)
result = invoker(batch)
...

What component(s) does this issue relate to?

No response

Version information

candidate-20240605.915

Additional context

It works when the model was compiled using the tf-importer and a Tensorflow SavedModel. The TF model uses FP32, the ONNX model FP16. Thats one of the reasons, why I want to import from the ONNX model.

I attached a snippet of the MLIR file produced by the ONNX importer (without weights).

yolov5s.mlir.txt

I need some advice on how to debug this further. How can I find the part where the execution gets stuck?

It seems to be the combination of ONNX and CUDA. ONNX frontend with CPU works. TF frontend and CUDA works too.

Could be related to #17376. @ScottTodd Can you tell me, what you learned there?

Sounds like #16666 is showing up outside of the individual op tests. Not sure, needs further debugging through the stack.

I ran the model with --trace_execution. It seems like it gets stuck on dealloca of the cuda Device.

The trace is attached.
onnx_cuda_trace.log