trtexec fails to build /mobile_sam_mask_decoder.onnx

Question

trtexec fails to build /mobile_sam_mask_decoder.onnx

fdarvas opened this issue 8 months ago · comments

Trying to run:

trtexec --onnx=data/mobile_sam_mask_decoder.onnx --saveEngine=data/mobile_sam_mask_decoder.engine --minShapes=point_coords:1x1x2,point_labels:1x1 --optShapes=point_coords:1x1x2,point_labels:1x1 --maxShapes=point_coords:1x10x2,point_labels:1x10

after successfully exporting mobile_sam_mask_decoder.onnx with:
python3 -m nanosam.tools.export_sam_mask_decoder_onnx --model-type=vit_t --checkpoint=assets/mobile_sam.pt --output=/mnt/e/data/mobile_sam_mask_decoder.onnx

resulting in this error:

onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[12/18/2023-11:39:43] [E] Error[4]: [graph.cpp::symbolicExecute::539] Error Code 4: Internal Error (/OneHot: an IIOneHotLayer cannot be used to compute a shape tensor)
[12/18/2023-11:39:43] [E] [TRT] ModelImporter.cpp:771: While parsing node number 146 [Tile -> "/Tile_output_0"]:
[12/18/2023-11:39:43] [E] [TRT] ModelImporter.cpp:772: --- Begin node ---
[12/18/2023-11:39:43] [E] [TRT] ModelImporter.cpp:773: input: "/Unsqueeze_3_output_0"
input: "/Reshape_2_output_0"
output: "/Tile_output_0"
name: "/Tile"
op_type: "Tile"

[12/18/2023-11:39:43] [E] [TRT] ModelImporter.cpp:774: --- End node ---
[12/18/2023-11:39:43] [E] [TRT] ModelImporter.cpp:777: ERROR: ModelImporter.cpp:195 In function parseGraph:
[6] Invalid Node - /Tile
[graph.cpp::symbolicExecute::539] Error Code 4: Internal Error (/OneHot: an IIOneHotLayer cannot be used to compute a shape tensor)
[12/18/2023-11:39:43] [E] Failed to parse onnx file
[12/18/2023-11:39:43] [I] Finished parsing network model. Parse time: 0.32614
[12/18/2023-11:39:43] [E] Parsing model failed
[12/18/2023-11:39:43] [E] Failed to create engine from model or file.
[12/18/2023-11:39:43] [E] Engine set up failed

Awesome0324 · Answer 1 · Mon Mar 04 2024 14:29:22 GMT+0800 (China Standard Time)

I had the same problem.
Have you solved it yet?

fdarvas · Answer 2 · Wed Mar 06 2024 02:46:50 GMT+0800 (China Standard Time)

Unfortunately I dont have a solution for it yet.

Richard · Answer 3 · Fri Mar 08 2024 04:39:12 GMT+0800 (China Standard Time)

Bump...

Same issue - any help would be much appreciated; thanks!

fwcore · Answer 4 · Mon Mar 11 2024 12:31:05 GMT+0800 (China Standard Time)

Two possible workarounds (use either one):

use the ONNX provided in the google drive link in README.md
use polygraphy tool to fold constant before sending into trtexec:

polygraphy surgeon sanitize data/mobile_sam_mask_decoder.onnx --fold-constants -o data/mobile_sam_mask_decoder_folded.onnx --fold-size-threshold 64

You might also need to install onnx-graphsurgeon

The resulted ONNX files obtained by above have no OneHot op, and can be converted to TensorRT with no problem.

More details

Using Netron to inspect the file, the ONNX file converted by the following command has OneHot op.

python3 -m nanosam.tools.export_sam_mask_decoder_onnx --model-type=vit_t --checkpoint=assets/mobile_sam.pt --output=/mnt/e/data/mobile_sam_mask_decoder.onnx

However, the ONNX file provided by the google drive link in README.md does not have OneHot op. It seems to be replaced by some constant tensors and where op. I don't know how this file is converted from scratch.

Binh Le · Answer 5 · Wed Mar 20 2024 15:39:13 GMT+0800 (China Standard Time)

Two possible workarounds (use either one):

Use torch==2.0.1, the latest torch version will export OneHot op in the output ONNX file, which can not be parsed with TensorRT
Replace file nanosam/mobile_sam/modeling/mask_decoder.py by this gist code, this works well for most of torch versions