Different predictions with Tensorrt7 for a refinedet model

Question

Different predictions with Tensorrt7 for a refinedet model

YaYaB opened this issue 4 years ago · comments

Configuration

Version of DeepDetect:
- Locally compiled on:
  - Ubuntu 14.04 LTS
  - Ubuntu 18.04 LTS
  - Mac OSX
  - Other:
- Docker
- Amazon AMI
Commit (shown by the server when starting):
0eac8a4

Your question / the problem you're facing:

I successfully installed a version of DD with tensorrt backend and tensorrt-oss.
A refineddet model is correctly loaded however the results obtained are very very different than what we obtain with the original model in caffe.

The following steps will help reproduce:

Download the model

PATH_MODEL="PATH_WANTED"
mkdir $PATH_MODEL && cd $PATH_MODEL
wget https://deepdetect.com/models/init/desktop/images/detection/faces_512.tar.gz
wget -O test_image.jpeg https://miro.medium.com/max/6528/1*DYUaxku5bfbZaDLW-4SyWg.jpeg
tar -xvf faces_512.tar.gz

Error message (if any) / steps to reproduce the problem:

Let us first use the caffe version of this model
Load the model

list of API calls:

curl -X PUT "http://localhost:8080/services/imageserv" -d '{
"mllib":"caffe",
"description":"image classification service",
"type":"supervised",
"parameters":{
    "input":{
    "connector":"image",
    "width":512,
    "height":512
    },
    "mllib": {
       "nclasses": 2
    }
},
"model":{
    "repository":"PATH_MODEL"
}
}'

Server log output:

[2020-08-10 12:15:47.549] [imageserv] [info] Using pre-trained weights from PATH_MODEL/model_iter_20000.caffemodel
[2020-08-10 12:15:47.648] [caffe] [info] Ignoring source layer label_data_1_split
[2020-08-10 12:15:47.670] [caffe] [info] Ignoring source layer arm_loc_ftune_arm_loc_0_split
[2020-08-10 12:15:47.670] [caffe] [info] Ignoring source layer arm_conf_ftune_arm_conf_0_split
[2020-08-10 12:15:47.670] [caffe] [info] Ignoring source layer arm_priorbox_arm_priorbox_0_split
[2020-08-10 12:15:47.671] [caffe] [info] Ignoring source layer arm_loss
[2020-08-10 12:15:47.671] [caffe] [info] Ignoring source layer odm_loss
[2020-08-10 12:15:47.675] [imageserv] [info] Net total flops=95632720896 / total params=33904320
[2020-08-10 12:15:47.675] [imageserv] [info] detected network type is detection
[2020-08-10 12:15:47.675] [api] [info] 127.0.0.1 "PUT /services/imageserv" 201 3867

Make a prediction

list of API calls:

curl -X POST "http://localhost:8080/predict" -d '{
      "service":"imageserv",
      "parameters":{
        "input":{
          "width":512,
          "height":512
        },
        "output":{
          "confidence_threshold": 0.5,
          "bbox": true
        }
      },
      "data": ["PAHT_MODEL/test_image.jpeg"]
    }'

Server log output:

{"status":{"code":200,"msg":"OK"},"head":{"method":"/predict","service":"imageserv","time":4203.0},"body":{"predictions":[{"classes":[{"bbox":{"ymax":1026.1483154296875,"xmax":1890.0640869140625,"xmin":1492.4638671875,"ymin":531.9712524414063},"cat":"1","prob":0.999995231628418},{"bbox":{"ymax":1498.1898193359375,"xmax":960.9320678710938,"xmin":201.794677734375,"ymin":456.78582763671877},"cat":"1","prob":0.9999672174453735},{"last":true,"bbox":{"ymax":824.03515625,"xmax":2732.435791015625,"xmin":2390.67578125,"ymin":366.8230285644531},"cat":"1","prob":0.9999634027481079}],"uri":"PATH_MODEL/test_image.jpeg"}]}}

We obtain 3 faces as expected (see the image here).

Now if we do the same using the tensorrt version of the model.
Load the model

list of API calls:

curl -X PUT "http://localhost:8080/services/imageserv" -d '{
"mllib":"tensorrt",
"description":"image classification service",
"type":"supervised",
"parameters":{
    "input":{
    "connector":"image",
    "width":512,
    "height":512,
    "mean": [104.146, 110.808, 119.856]
    },
    "mllib": {
       "datatype": "fp32",
       "nclasses": 1,
       "maxBatchSize": 6,
       "maxWorkspaceSize": 1000,
       "gpuid":0
    }
},
"model":{
    "repository":"PATH_MODEL"
}
}'

Server log output:

DeepDetect [ commit 0eac8a49404292529a6ea1810290af800092902c ]
[2020-08-10 12:19:38.712] [api] [info] Running DeepDetect HTTP server on localhost:8080
[2020-08-10 12:19:40.624] [imageserv] [info] Plugin Creator registration succeeded - GridAnchor_TRT
[2020-08-10 12:19:40.624] [imageserv] [info] Plugin Creator registration succeeded - NMS_TRT
[2020-08-10 12:19:40.624] [imageserv] [info] Plugin Creator registration succeeded - Reorg_TRT
[2020-08-10 12:19:40.624] [imageserv] [info] Plugin Creator registration succeeded - Region_TRT
[2020-08-10 12:19:40.624] [imageserv] [info] Plugin Creator registration succeeded - PriorBox_TRT
[2020-08-10 12:19:40.624] [imageserv] [info] Plugin Creator registration succeeded - Normalize_TRT
[2020-08-10 12:19:40.624] [imageserv] [info] Plugin Creator registration succeeded - RPROI_TRT
[2020-08-10 12:19:40.624] [imageserv] [info] Plugin Creator registration succeeded - BatchedNMS_TRT
[2020-08-10 12:19:40.624] [imageserv] [info] Plugin Creator registration succeeded - FlattenConcat_TRT
[2020-08-10 12:19:40.624] [imageserv] [info] Plugin Creator registration succeeded - CropAndResize
[2020-08-10 12:19:40.624] [imageserv] [info] Plugin Creator registration succeeded - Proposal
[2020-08-10 12:19:40.624] [imageserv] [info] Plugin Creator registration succeeded - BatchTilePlugin_TRT
[2020-08-10 12:19:40.624] [imageserv] [info] Plugin Creator registration succeeded - DetectionLayer_TRT
[2020-08-10 12:19:40.624] [imageserv] [info] Plugin Creator registration succeeded - ProposalLayer_TRT
[2020-08-10 12:19:40.624] [imageserv] [info] Plugin Creator registration succeeded - PyramidROIAlign_TRT
[2020-08-10 12:19:40.624] [imageserv] [info] Plugin Creator registration succeeded - ResizeNearest_TRT
[2020-08-10 12:19:40.624] [imageserv] [info] Plugin Creator registration succeeded - SpecialSlice_TRT
[2020-08-10 12:19:40.624] [imageserv] [info] Plugin Creator registration succeeded - InstanceNormalization_TRT
[2020-08-10 12:19:40.624] [imageserv] [info] setting max workspace size to 1048576000
[2020-08-10 12:19:40.624] [imageserv] [info] setting max batch size to 6

Make predictions

list of API calls:

curl -X POST "http://localhost:8080/predict" -d '{
      "service":"imageserv",
      "parameters":{
        "input":{
          "width":512,
          "height":512
        },
        "output":{
          "confidence_threshold": 0.5,
          "bbox": true
        }
      },
      "data": ["PATH_MODEL/test_image.jpeg"]
    }'

Server log output:

{"status":{"code":200,"msg":"OK"},"head":{"method":"/predict","service":"imageserv","time":86.0},"body":{"predictions":[{"classes":[{"bbox":{"ymax":1008.7504272460938,"xmax":2232.509033203125,"xmin":1251.8868408203125,"ymin":585.7403564453125},"cat":"1","prob":0.9999964237213135},{"bbox":{"ymax":1362.1265869140625,"xmax":1992.9556884765625,"xmin":1462.7811279296875,"ymin":576.5375366210938},"cat":"1","prob":0.999995231628418},{"bbox":{"ymax":843.9384155273438,"xmax":3020.7353515625,"xmin":2155.833740234375,"ymin":443.4128112792969},"cat":"1","prob":0.9999885559082031},{"bbox":{"ymax":1063.1495361328125,"xmax":2013.5091552734375,"xmin":1463.1455078125,"ymin":273.2212829589844},"cat":"1","prob":0.9999780654907227},{"bbox":{"ymax":1522.9658203125,"xmax":995.2101440429688,"xmin":404.69158935546877,"ymin":661.9135131835938},"cat":"1","prob":0.9999672174453735},{"bbox":{"ymax":1052.681884765625,"xmax":2815.238525390625,"xmin":2289.733642578125,"ymin":333.2514953613281},"cat":"1","prob":0.9999667406082153},{"bbox":{"ymax":836.4298706054688,"xmax":2637.939697265625,"xmin":2355.98193359375,"ymin":408.5758972167969},"cat":"1","prob":0.9999634027481079},{"bbox":{"ymax":851.178955078125,"xmax":1890.22509765625,"xmin":1474.4056396484375,"ymin":556.6243896484375},"cat":"1","prob":0.9999524354934692},{"bbox":{"ymax":847.6719970703125,"xmax":2792.68505859375,"xmin":2395.373291015625,"ymin":536.9264526367188},"cat":"1","prob":0.9999485015869141},{"bbox":{"ymax":765.6693725585938,"xmax":2743.524658203125,"xmin":2457.97216796875,"ymin":343.8699645996094},"cat":"1","prob":0.9999476671218872},{"bbox":{"ymax":1399.032470703125,"xmax":912.3306884765625,"xmin":101.80696105957031,"ymin":791.3206176757813},"cat":"1","prob":0.9999427795410156},{"bbox":{"ymax":999.32958984375,"xmax":1922.61572265625,"xmin":1631.05029296875,"ymin":556.4727172851563},"cat":"1","prob":0.9999374151229858},{"bbox":{"ymax":1841.7998046875,"xmax":1153.1229248046875,"xmin":62.47994613647461,"ymin":206.24771118164063},"cat":"1","prob":0.9999256134033203},{"bbox":{"ymax":998.0281982421875,"xmax":1883.58251953125,"xmin":1478.87939453125,"ymin":698.8909912109375},"cat":"1","prob":0.9999082088470459},{"bbox":{"ymax":993.3325805664063,"xmax":1737.7374267578125,"xmin":1440.0362548828125,"ymin":554.254150390625},"cat":"1","prob":0.9999079704284668},{"bbox":{"ymax":1475.51318359375,"xmax":1372.742919921875,"xmin":0.0,"ymin":544.5368041992188},"cat":"1","prob":0.999876856803894},{"bbox":{"ymax":1104.88916015625,"xmax":915.4425659179688,"xmin":116.52717590332031,"ymin":509.49798583984377},"cat":"1","prob":0.9998511075973511},{"bbox":{"ymax":1186.427490234375,"xmax":1840.67041015625,"xmin":1298.7374267578125,"ymin":468.38287353515627},"cat":"1","prob":0.9994650483131409},{"bbox":{"ymax":1176.736083984375,"xmax":2111.431884765625,"xmin":1238.377197265625,"ymin":794.1444702148438},"cat":"1","prob":0.9903794527053833},{"bbox":{"ymax":1044.1197509765625,"xmax":2992.68603515625,"xmin":2149.134521484375,"ymin":686.8695068359375},"cat":"1","prob":0.8381036520004273},{"last":true,"bbox":{"ymax":1228.653076171875,"xmax":2776.582275390625,"xmin":1960.4600830078125,"ymin":899.2258911132813},"cat":"1","prob":0.6939422488212586}],"uri":"PATH_MODEL/test_image.jpeg"}]}}

We obtain way more predictions than what we should obtain.
I tried without setting an image mean in th request. We sill obtain different things with a lot of predictions.

I thought may be NMS was not working well however we do not get exactly the same bboxes here.

Hoping that you could help me with this.

Guillaume Infantes · Answer 1 · Tue Aug 11 2020 20:52:21 GMT+0800 (China Standard Time)

Hi there!
this one was a loooong shot, but it should be okay with this : #770

A lot of thank for the very detail bug report, it helped a lot !

YaYaB · Answer 2 · Tue Aug 11 2020 23:14:37 GMT+0800 (China Standard Time)

Hey @fantes,
I tried with your fix however I get the following error when I try to make a prediction

[2020-08-11 15:03:10.466] [api] [info] Running DeepDetect HTTP server on localhost:8080
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::GridAnchor_TRT
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::NMS_TRT
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::Reorg_TRT
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::Region_TRT
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::Clip_TRT
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::LReLU_TRT
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::PriorBox_TRT
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::Normalize_TRT
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::RPROI_TRT
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::BatchedNMS_TRT
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::FlattenConcat_TRT
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::CropAndResize
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::DetectionLayer_TRT
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::Proposal
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::ProposalLayer_TRT
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::PyramidROIAlign_TRT
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::ResizeNearest_TRT
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::Split
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::SpecialSlice_TRT
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::InstanceNormalization_TRT
[2020-08-11 15:11:18.192] [imageserv] [info] setting max workspace size to 1048576000
[2020-08-11 15:11:18.192] [imageserv] [info] setting max batch size to 6
[2020-08-11 15:11:18.991] [api] [info] 127.0.0.1 "PUT /services/imageserv" 201 799
dede: nmsPlugin.cpp:54: virtual nvinfer1::Dims nvinfer1::plugin::DetectionOutput::getOutputDimensions(int, const nvinfer1::Dims*, int): Assertion `nbInputDims == 3' failed.
Aborted (core dumped)

tensorrt-oss is successfully built and the version of the binaries used are correct. I do not know where that comes from :s

Guillaume Infantes · Answer 3 · Tue Aug 11 2020 23:30:31 GMT+0800 (China Standard Time)

did you start from clean build tree ? build/tensorrt-oss should be completely removed before rebuilding things, it is not automatically / cleaned/patched/rebuilt

nmsPlugin.cpp is at build/tensorrt-oss/src/tessorrt-oss/plugin/nmsPlugin/nmsPlugin.cpp, could you check line 54? could you check the assertion at beginning of getOutputDimensions in nmsPlugin.cpp ? it should read ASSERT(nbInputDims == 3 || nbInputDims == 5);

if the source is patched and rebuilt correctly, then maybe it is the issue as before : link to wrong libnvinfer_plugin.so

YaYaB · Answer 4 · Wed Aug 12 2020 17:32:22 GMT+0800 (China Standard Time)

Hey!
After a bit of struggle I finally succeeded in transforming a detection model. I got good results but still have some weird bboxes.
See the two first bboxes :O

{"status":{"code":200,"msg":"OK"},"head":{"method":"/predict","service":"imageserv","time":40653.0},"body":{"predictions":[{"classes":[{"bbox":{"ymax":-4440.0,"xmax":0.0,"xmin":0.0,"ymin":0.0},"cat":"0","prob":1.875},{"bbox":{"ymax":-2368.0,"xmax":0.0,"xmin":0.0,"ymin":0.0},"cat":"0","prob":1.0},{"bbox":{"ymax":1026.1483154296875,"xmax":1890.064453125,"xmin":1492.464111328125,"ymin":531.9712524414063},"cat":"1","prob":0.999995231628418},{"bbox":{"ymax":1498.189453125,"xmax":960.9320068359375,"xmin":201.79478454589845,"ymin":456.7858581542969},"cat":"1","prob":0.9999672174453735},{"last":true,"bbox":{"ymax":824.0353393554688,"xmax":2732.435791015625,"xmin":2390.67578125,"ymin":366.8230895996094},"cat":"1","prob":0.9999634027481079}],"uri":"PATH_MODEL/test_image.jpeg"}]}}

Btw I checked your MR and saw that you renamed the binary folder for tensorrt (was not working at the beginning because of this, needed to update my dockerfile).
However the modification is not everywhere L909 of the CMakelist you have

      -DTRT_BIN_DIR=${CMAKE_BINARY_DIR}/tensorrt-oss/bin

But to be coherent with your modification it should be

      -DTRT_BIN_DIR=${CMAKE_BINARY_DIR}/tensorrt-oss/src/tensorrt-oss-build

No ?

Guillaume Infantes · Answer 5 · Wed Aug 12 2020 17:48:21 GMT+0800 (China Standard Time)

hi

you're right, indeed the way to define output binary dir has changed in TRT from TRT_BIN_DIR to TRT_OUT_DIR , i did not notice it.
I just fixed it, so why updated PR, the older behavior is restored, ie .so are in ${CMAKE_BINARY_DIR}/tensorrt-oss/bin , ie build/tensorrt-oss/bin (i guess you'll have to roll back get your docker file)

Could you give me a test case where bboxes are weird ? is it the same as above ?

thanks !

YaYaB · Answer 6 · Wed Aug 12 2020 17:58:34 GMT+0800 (China Standard Time)

Hey, great!
It is the same test case as above!

YaYaB · Answer 7 · Tue Aug 18 2020 15:28:58 GMT+0800 (China Standard Time)

Hey @fantes Any update on this?

Emmanuel Benazera · Answer 8 · Tue Aug 18 2020 17:45:35 GMT+0800 (China Standard Time)

Have you looked at #770 ? @fantes is away for the next 15 days.

YaYaB · Answer 9 · Tue Aug 18 2020 17:55:34 GMT+0800 (China Standard Time)

Thank for the reply.
Yep I tried #770 However I got some empty bboxes as explaiend #769 (comment)

Guillaume Infantes · Answer 10 · Tue Sep 01 2020 19:29:52 GMT+0800 (China Standard Time)

Hi @YaYaB , unfortunately, I cannot reproduce, after a compile from scratch with -DUSE_TENSORRT=ON -DUSE_TENSORRT_OSS=ON -DUSE_CAFFE=ON while using your exact example, i get

{"status":{"code":200,"msg":"OK"},"head":{"method":"/predict","service":"imageserv","time":25366.0},"body":{"predictions":[{"classes":[{"bbox":{"ymax":1026.1483154296875,"xmax":1890.064453125,"xmin":1492.464111328125,"ymin":531.9712524414063},"cat":"1","prob":0.999995231628418},{"bbox":{"ymax":1498.189453125,"xmax":960.9320068359375,"xmin":201.79478454589845,"ymin":456.7858581542969},"cat":"1","prob":0.9999672174453735},{"last":true,"bbox":{"ymax":824.0353393554688,"xmax":2732.435791015625,"xmin":2390.67578125,"ymin":366.8230895996094},"cat":"1","prob":0.9999634027481079}],"uri":"/home/infantes/test_image.jpeg"}]}}

which seems correct (my commit is cb9df34)

YaYaB · Answer 11 · Tue Sep 01 2020 23:59:45 GMT+0800 (China Standard Time)

@fantes I've tried again building from scratch with the specific commit and now it works.
Thanks!