NVIDIA-AI-IOT / CUDA-PointPillars

A project demonstrating how to use CUDA-PointPillars to deal with cloud points data from lidar.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Mismatch in number of Detections with TFRT onnx inference VS pytorch, pth file

Allamrahul opened this issue · comments

Dataset: I am using a custom dataset with npy files and annotations. I followed all steps required for custom dataset preparation and I am able to get great results with pytorch with 90% map on my eval set.

However, once I convert the pth file to onnx format using exporter.py, for every point cloud in my eval dataset, I am seeing relatively smaller number of detections using TFRT inference with the cpp script as opposed to what I am getting using pytorch with the pth file.

In regard to the export process, exporter.py and simplifier_onnx.py are being used in the script. However, both scripts are hardcoded for 3 classes for kitti dataset. I have just one class to detect. Hence, I referred to the following commit to make the onnx export work: https://github.com/NVIDIA-AI-IOT/CUDA-PointPillars/pull/77/commits. After this , I was able to export but I faced the following issue after this: #82. I resolved this by tinkering with the export script, as mentioned on the following comment: #77 (comment). After this, my detections using TFRT onnx were atleast a subset of what I was seeing with pytorch but not the whole set. There is a clear delta between TFRT onnx and pytorch pth combo, in majority of my eval set. This can be seen in the following table:

Bounding box delta comparision: pytorch .pth VS TensorFlow RT onnx

<style> </style>
File Pytorch pth TFRT cpp using .onnx file Delta
000000.npy tensor([[  9.6498,   1.1609,   1.9397,   0.2856,   0.4898,   2.8947,   6.2814],         [ 24.8358,   1.3459,   2.5912,   0.2332,   0.4984,   3.0438,   6.2827],         [ 24.9936, -10.4810,   3.2429,   0.2568,   0.4702,   3.1647,   6.2816],         [  9.8542, -10.6894,   2.1888,   0.4316,   0.4553,   2.7412,   6.2486]],        device='cuda:0') 24.8358 1.34592 2.59117 0.23324 0.498444 3.04383 6.28266 0 0.46325 ; 24.9936 -10.481 3.24294 0.256755 0.47017 3.16474 6.28156 0 0.445165 ; 9.8573 -10.6925 2.17166 0.433223 0.452724 2.7258 6.24912 0 0.445157 1
000001.npy tensor([[  9.6501,   1.1778,   1.8507,   0.2533,   0.4935,   2.7208,   6.2741],         [ 24.9947, -10.4883,   3.0557,   0.2706,   0.4838,   3.0915,   6.2594],         [ 24.8404,   1.3479,   2.6033,   0.2287,   0.4947,   3.0391,   6.2825],         [  9.8570, -10.6883,   2.1663,   0.4322,   0.4521,   2.7124,   6.2346]],        device='cuda:0') 9.65337 1.1817 1.80798 0.248034 0.493837 2.66008 6.27361 0 ; 24.9947 -10.4883 3.05572 0.270619 0.483843 3.09145 6.25942 0 0.670895 ; 24.8404 1.34787 2.60326 0.228719 0.494724 3.03909 6.28252 0 0.459299 ; 9.8545 -10.6925 2.1472 0.438129 0.448904 2.7132 6.23376 0 0.424986 0
000002.npy tensor([[  9.6042,   1.1503,   2.0593,   0.2839,   0.4955,   2.9902,   6.3128],         [ 24.7882,   1.3638,   2.6522,   0.2538,   0.5039,   3.1623,   6.2903],         [  9.7436, -10.6760,   2.1350,   0.3712,   0.4578,   2.6609,   6.2507],         [ 24.9494, -10.5134,   3.2150,   0.2888,   0.4944,   3.3462,   6.2143]],        device='cuda:0') 9.74478 -10.6817 2.1041 0.374984 0.453993 2.63108 6.25019 0 0.532783 ; 24.9494 -10.5134 3.21504 0.288844 0.494413 3.34624 6.21432 0 0.515557 ; 0.309276 -10.6853 2.08503 0.458935 0.413923 3.13058 6.09365 0 0.412784 1
000003.npy tensor([[  9.5610, -10.4589,   2.1206,   0.4139,   0.4505,   2.7193,   6.2802],         [ 24.3758,   1.7272,   2.6000,   0.2396,   0.4966,   3.0571,   6.1985],         [ 24.7097, -10.1406,   3.0566,   0.2619,   0.4718,   3.0835,   6.2728],         [  9.2311,   1.3354,   1.8251,   0.2543,   0.4891,   2.7015,   6.2441],         [  8.9262,   7.8720,   2.1033,   0.3872,   0.4424,   2.7067,   6.3819]],        device='cuda:0') 9.56115 -10.4598 2.09798 0.418282 0.448642 2.68469 6.27597 0 0.735731 ; 24.3758 1.72724 2.59998 0.239596 0.496643 3.05714 6.19854 0 0.629267 ; 24.7097 -10.1406 3.0566 0.26186 0.471776 3.08349 6.27275 0 0.585723 ; 9.21606 1.33047 1.82858 0.254299 0.490583 2.66956 6.23728 0 0.471899 1
000004.npy tensor([[ 6.4732,  2.6481,  1.7006,  0.2879,  0.4678,  2.6444,  6.3118],         [21.4290,  4.8774,  2.5937,  0.2325,  0.5022,  3.1258,  6.4040],         [23.1383, -6.8599,  2.7714,  0.2839,  0.4960,  3.0160,  6.3080],         [ 8.1175, -8.9831,  2.2486,  0.3856,  0.4450,  2.7676,  6.3550]],        device='cuda:0') 23.1383 -6.85986 2.77142 0.283893 0.495966 3.01596 6.30801 0 0.580739 ; 8.11463 -8.9818 2.12152 0.396575 0.436063 2.65015 6.35895 0 0.429396 2
000005.npy tensor([[ 5.5251,  2.7731,  1.6679,  0.3284,  0.4662,  2.6940,  6.2788],         [20.4834,  5.0487,  2.5489,  0.2769,  0.5241,  3.1817,  6.4027],         [ 7.3220, -8.8810,  2.1011,  0.4506,  0.4281,  2.6641,  6.3688],         [22.2850, -6.6383,  2.6867,  0.2744,  0.4986,  3.0367,  6.3119]],        device='cuda:0') 7.32207 -8.88152 2.0861 0.445914 0.430497 2.6552 6.36896 0 0.696223 3
000006.npy tensor([[18.0280,  4.9469,  2.4509,  0.3035,  0.5205,  3.1520,  6.3221],         [19.8413, -6.7181,  2.7475,  0.3097,  0.5246,  3.2910,  6.3001],         [ 3.1871,  2.6373,  1.7287,  0.4621,  0.4224,  2.9021,  6.3156],         [ 4.8621, -8.9172,  1.8402,  0.4540,  0.3952,  2.5332,  6.3420],         [32.0742,  7.1384,  3.3039,  0.2361,  0.4806,  3.3647,  6.4108],         [21.2824, 12.1162,  3.6256,  0.2676,  0.4659,  3.5638,  6.5643],         [ 0.6082,  4.4304,  1.8762,  0.4470,  0.4348,  3.4172,  6.2065]],        device='cuda:0') 4.85492 -8.92965 1.819 0.460386 0.396642 2.5153 6.34298 0 0.494817 6
000007.npy tensor([[18.2038, -6.8837,  2.5308,  0.3099,  0.5277,  3.1208,  6.3168],         [16.5025,  4.7925,  2.3577,  0.3065,  0.5248,  3.0787,  6.3005],         [ 1.5735,  2.6487,  1.6249,  0.5034,  0.4109,  2.6605,  6.3160],         [ 2.2250,  2.7058,  1.8312,  0.4703,  0.4060,  3.0384,  6.3380],         [ 3.2350, -8.9478,  1.8462,  0.4438,  0.4085,  2.5771,  6.3109],         [19.7396, 11.9755,  3.2925,  0.2890,  0.5000,  3.6453,  6.5671],         [ 3.5311,  2.8095,  2.3147,  0.4571,  0.4455,  4.2559,  6.3274],         [30.5054,  6.8140,  3.3753,  0.2804,  0.5016,  3.6093,  6.2777]],        device='cuda:0') 18.2057 -6.88499 2.4907 0.307031 0.527094 3.07328 6.31815 0 0.636754 ; 16.502 4.79033 2.33373 0.299566 0.523598 3.0561 6.3044 0 0.532995 ; 1.56738 2.64373 1.68283 0.506594 0.412098 2.66617 6.31967 0 0.51762 ; 3.22002 -8.95614 1.8366 0.449459 0.409571 2.56386 6.3068 0 0.431358 ; 2.2279 2.70934 1.85016 0.464891 0.40516 3.07841 6.33425 0 0.391239 ; 19.7397 11.9755 3.29258 0.28902 0.499917 3.64496 6.56848 0 0.381675 2
000008.npy tensor([[ 8.7021, -7.9169,  2.6375,  0.3647,  0.4888,  3.5404,  6.2655],         [ 7.7196,  3.7774,  2.3025,  0.4060,  0.4704,  3.2993,  6.2707],         [22.8483, -6.6640,  3.5341,  0.3350,  0.5277,  4.1040,  6.3141],         [21.7832,  5.1120,  2.8534,  0.2781,  0.5178,  3.2145,  6.1912],         [ 3.2359, -8.4495,  2.0291,  0.4187,  0.4105,  3.2451,  6.2915]],        device='cuda:0') 8.70127 -7.92042 2.62612 0.36539 0.486129 3.51703 6.26476 0 0.864963 ; 7.6994 3.79393 2.24546 0.40736 0.469539 3.21603 6.25044 0 0.73586 ; 22.8483 -6.66398 3.53411 0.335008 0.527745 4.10398 6.31413 0 0.605781 ; 21.7832 5.11193 2.85462 0.278421 0.517415 3.21271 6.21329 0 0.508611 ; 1
000009.npy tensor([[19.5711,  4.7877,  2.6956,  0.3077,  0.5412,  3.3734,  6.2451],         [ 6.3672, -8.0972,  2.7778,  0.4181,  0.4778,  4.1039,  6.2421],         [ 5.4901,  3.6080,  2.3323,  0.4340,  0.4502,  3.7175,  6.2740],         [20.3728, -7.0433,  3.3803,  0.3514,  0.5351,  4.1972,  6.3070],         [26.6330, 11.8861,  3.9950,  0.3089,  0.5019,  4.1503,  6.6127]],        device='cuda:0') 5.47306 3.61103 2.394 0.432978 0.453338 3.80027 6.32163 0 0.714706 ; 19.5717 4.78751 2.71062 0.308163 0.539413 3.36241 6.27686 0 0.621834 ; 6.35329 -8.10289 2.76789 0.422266 0.47866 4.13415 6.24032 0 0.606208 2
000010.npy tensor([[18.3196,  4.6323,  3.2815,  0.3700,  0.5370,  4.5950,  6.3164],         [ 5.0913, -8.1561,  2.6470,  0.4329,  0.4667,  4.0704,  6.2747],         [19.1831, -7.1906,  3.3499,  0.3578,  0.5279,  4.2080,  6.3127],         [ 2.5482,  4.3696,  1.6065,  0.4281,  0.3918,  2.8003,  6.2634]],        device='cuda:0') 5.08485 -8.16716 2.64149 0.431825 0.466464 4.03816 6.27571 0 0.731938 ; 19.1846 -7.19002 3.2872 0.352221 0.529464 4.08496 6.31286 0 0.591408 2
000011.npy tensor([[15.3577, -7.3005,  3.0413,  0.3812,  0.5104,  4.2909,  6.3159],         [ 0.6093,  3.4074,  1.9033,  0.5056,  0.4306,  3.3583,  6.1790],         [14.5397,  4.4909,  3.0513,  0.3723,  0.5222,  4.3821,  6.2383],         [30.4700, -6.2796,  4.0225,  0.2914,  0.4843,  3.8403,  6.3179],         [29.6795,  5.5980,  4.0535,  0.2816,  0.4877,  3.9741,  6.2869]],        device='cuda:0') 0.594493 3.41456 2.11992 0.502219 0.441799 3.74912 6.17387 0 0.828488 ; 15.3587 -7.29961 2.99875 0.375657 0.512654 4.18005 6.31556 0 0.798267 ; 30.47 -6.27963 4.02255 0.29143 0.484331 3.84032 6.31788 0 0.434042 2
000012.npy tensor([[ 11.2944,   4.3980,   3.0133,   0.3911,   0.5198,   4.6365,   6.2670],         [ 26.4576,   5.3648,   3.6263,   0.3002,   0.5062,   3.8833,   6.3176],         [ 12.0963,  -7.3715,   3.0630,   0.3846,   0.5122,   4.3017,   6.2922],         [  8.1463, -12.5014,   2.9129,   0.3691,   0.4980,   3.9686,   6.1562],         [ 27.1433,  -6.4810,   3.9175,   0.3048,   0.5110,   3.9699,   6.3372],         [ 18.4373,  11.4960,   3.7129,   0.3159,   0.4918,   4.2750,   6.4670]],        device='cuda:0') 8.14566 -12.506 2.84502 0.364298 0.498799 3.84938 6.15557 0 0.378752 ; 12.0904 -7.37816 2.90519 0.378811 0.516209 4.00902 6.29017 0 0.376648 4

Please let me know if you know something that could help me.

I see the same behavior with the kitti dataset as well, as follows:
image
Can anyone confirm if this an expected behavior or is this not supposed to happen?

Hello, can you tell me how much the 3D detection performance drops?

Hi, from my initial comment, there is delta as large as 6 in 000006.npy between pytorch pth and TFRT inference. I have about 30 evaluation point clouds and I see this drop in 90 % of them. Is there anything I can do to avoid this?