Different predictions with Tensorrt7 for a refinedet model
YaYaB opened this issue · comments
Configuration
- Version of DeepDetect:
- Locally compiled on:
- Ubuntu 14.04 LTS
- Ubuntu 18.04 LTS
- Mac OSX
- Other:
- Docker
- Amazon AMI
- Locally compiled on:
- Commit (shown by the server when starting):
0eac8a4
Your question / the problem you're facing:
I successfully installed a version of DD with tensorrt backend and tensorrt-oss.
A refineddet model is correctly loaded however the results obtained are very very different than what we obtain with the original model in caffe.
The following steps will help reproduce:
Download the model
PATH_MODEL="PATH_WANTED"
mkdir $PATH_MODEL && cd $PATH_MODEL
wget https://deepdetect.com/models/init/desktop/images/detection/faces_512.tar.gz
wget -O test_image.jpeg https://miro.medium.com/max/6528/1*DYUaxku5bfbZaDLW-4SyWg.jpeg
tar -xvf faces_512.tar.gz
Error message (if any) / steps to reproduce the problem:
Let us first use the caffe version of this model
Load the model
- list of API calls:
curl -X PUT "http://localhost:8080/services/imageserv" -d '{
"mllib":"caffe",
"description":"image classification service",
"type":"supervised",
"parameters":{
"input":{
"connector":"image",
"width":512,
"height":512
},
"mllib": {
"nclasses": 2
}
},
"model":{
"repository":"PATH_MODEL"
}
}'
- Server log output:
[2020-08-10 12:15:47.549] [imageserv] [info] Using pre-trained weights from PATH_MODEL/model_iter_20000.caffemodel
[2020-08-10 12:15:47.648] [caffe] [info] Ignoring source layer label_data_1_split
[2020-08-10 12:15:47.670] [caffe] [info] Ignoring source layer arm_loc_ftune_arm_loc_0_split
[2020-08-10 12:15:47.670] [caffe] [info] Ignoring source layer arm_conf_ftune_arm_conf_0_split
[2020-08-10 12:15:47.670] [caffe] [info] Ignoring source layer arm_priorbox_arm_priorbox_0_split
[2020-08-10 12:15:47.671] [caffe] [info] Ignoring source layer arm_loss
[2020-08-10 12:15:47.671] [caffe] [info] Ignoring source layer odm_loss
[2020-08-10 12:15:47.675] [imageserv] [info] Net total flops=95632720896 / total params=33904320
[2020-08-10 12:15:47.675] [imageserv] [info] detected network type is detection
[2020-08-10 12:15:47.675] [api] [info] 127.0.0.1 "PUT /services/imageserv" 201 3867
Make a prediction
- list of API calls:
curl -X POST "http://localhost:8080/predict" -d '{
"service":"imageserv",
"parameters":{
"input":{
"width":512,
"height":512
},
"output":{
"confidence_threshold": 0.5,
"bbox": true
}
},
"data": ["PAHT_MODEL/test_image.jpeg"]
}'
- Server log output:
{"status":{"code":200,"msg":"OK"},"head":{"method":"/predict","service":"imageserv","time":4203.0},"body":{"predictions":[{"classes":[{"bbox":{"ymax":1026.1483154296875,"xmax":1890.0640869140625,"xmin":1492.4638671875,"ymin":531.9712524414063},"cat":"1","prob":0.999995231628418},{"bbox":{"ymax":1498.1898193359375,"xmax":960.9320678710938,"xmin":201.794677734375,"ymin":456.78582763671877},"cat":"1","prob":0.9999672174453735},{"last":true,"bbox":{"ymax":824.03515625,"xmax":2732.435791015625,"xmin":2390.67578125,"ymin":366.8230285644531},"cat":"1","prob":0.9999634027481079}],"uri":"PATH_MODEL/test_image.jpeg"}]}}
We obtain 3 faces as expected (see the image here).
Now if we do the same using the tensorrt version of the model.
Load the model
- list of API calls:
curl -X PUT "http://localhost:8080/services/imageserv" -d '{
"mllib":"tensorrt",
"description":"image classification service",
"type":"supervised",
"parameters":{
"input":{
"connector":"image",
"width":512,
"height":512,
"mean": [104.146, 110.808, 119.856]
},
"mllib": {
"datatype": "fp32",
"nclasses": 1,
"maxBatchSize": 6,
"maxWorkspaceSize": 1000,
"gpuid":0
}
},
"model":{
"repository":"PATH_MODEL"
}
}'
- Server log output:
DeepDetect [ commit 0eac8a49404292529a6ea1810290af800092902c ]
[2020-08-10 12:19:38.712] [api] [info] Running DeepDetect HTTP server on localhost:8080
[2020-08-10 12:19:40.624] [imageserv] [info] Plugin Creator registration succeeded - GridAnchor_TRT
[2020-08-10 12:19:40.624] [imageserv] [info] Plugin Creator registration succeeded - NMS_TRT
[2020-08-10 12:19:40.624] [imageserv] [info] Plugin Creator registration succeeded - Reorg_TRT
[2020-08-10 12:19:40.624] [imageserv] [info] Plugin Creator registration succeeded - Region_TRT
[2020-08-10 12:19:40.624] [imageserv] [info] Plugin Creator registration succeeded - PriorBox_TRT
[2020-08-10 12:19:40.624] [imageserv] [info] Plugin Creator registration succeeded - Normalize_TRT
[2020-08-10 12:19:40.624] [imageserv] [info] Plugin Creator registration succeeded - RPROI_TRT
[2020-08-10 12:19:40.624] [imageserv] [info] Plugin Creator registration succeeded - BatchedNMS_TRT
[2020-08-10 12:19:40.624] [imageserv] [info] Plugin Creator registration succeeded - FlattenConcat_TRT
[2020-08-10 12:19:40.624] [imageserv] [info] Plugin Creator registration succeeded - CropAndResize
[2020-08-10 12:19:40.624] [imageserv] [info] Plugin Creator registration succeeded - Proposal
[2020-08-10 12:19:40.624] [imageserv] [info] Plugin Creator registration succeeded - BatchTilePlugin_TRT
[2020-08-10 12:19:40.624] [imageserv] [info] Plugin Creator registration succeeded - DetectionLayer_TRT
[2020-08-10 12:19:40.624] [imageserv] [info] Plugin Creator registration succeeded - ProposalLayer_TRT
[2020-08-10 12:19:40.624] [imageserv] [info] Plugin Creator registration succeeded - PyramidROIAlign_TRT
[2020-08-10 12:19:40.624] [imageserv] [info] Plugin Creator registration succeeded - ResizeNearest_TRT
[2020-08-10 12:19:40.624] [imageserv] [info] Plugin Creator registration succeeded - SpecialSlice_TRT
[2020-08-10 12:19:40.624] [imageserv] [info] Plugin Creator registration succeeded - InstanceNormalization_TRT
[2020-08-10 12:19:40.624] [imageserv] [info] setting max workspace size to 1048576000
[2020-08-10 12:19:40.624] [imageserv] [info] setting max batch size to 6
Make predictions
- list of API calls:
curl -X POST "http://localhost:8080/predict" -d '{
"service":"imageserv",
"parameters":{
"input":{
"width":512,
"height":512
},
"output":{
"confidence_threshold": 0.5,
"bbox": true
}
},
"data": ["PATH_MODEL/test_image.jpeg"]
}'
- Server log output:
{"status":{"code":200,"msg":"OK"},"head":{"method":"/predict","service":"imageserv","time":86.0},"body":{"predictions":[{"classes":[{"bbox":{"ymax":1008.7504272460938,"xmax":2232.509033203125,"xmin":1251.8868408203125,"ymin":585.7403564453125},"cat":"1","prob":0.9999964237213135},{"bbox":{"ymax":1362.1265869140625,"xmax":1992.9556884765625,"xmin":1462.7811279296875,"ymin":576.5375366210938},"cat":"1","prob":0.999995231628418},{"bbox":{"ymax":843.9384155273438,"xmax":3020.7353515625,"xmin":2155.833740234375,"ymin":443.4128112792969},"cat":"1","prob":0.9999885559082031},{"bbox":{"ymax":1063.1495361328125,"xmax":2013.5091552734375,"xmin":1463.1455078125,"ymin":273.2212829589844},"cat":"1","prob":0.9999780654907227},{"bbox":{"ymax":1522.9658203125,"xmax":995.2101440429688,"xmin":404.69158935546877,"ymin":661.9135131835938},"cat":"1","prob":0.9999672174453735},{"bbox":{"ymax":1052.681884765625,"xmax":2815.238525390625,"xmin":2289.733642578125,"ymin":333.2514953613281},"cat":"1","prob":0.9999667406082153},{"bbox":{"ymax":836.4298706054688,"xmax":2637.939697265625,"xmin":2355.98193359375,"ymin":408.5758972167969},"cat":"1","prob":0.9999634027481079},{"bbox":{"ymax":851.178955078125,"xmax":1890.22509765625,"xmin":1474.4056396484375,"ymin":556.6243896484375},"cat":"1","prob":0.9999524354934692},{"bbox":{"ymax":847.6719970703125,"xmax":2792.68505859375,"xmin":2395.373291015625,"ymin":536.9264526367188},"cat":"1","prob":0.9999485015869141},{"bbox":{"ymax":765.6693725585938,"xmax":2743.524658203125,"xmin":2457.97216796875,"ymin":343.8699645996094},"cat":"1","prob":0.9999476671218872},{"bbox":{"ymax":1399.032470703125,"xmax":912.3306884765625,"xmin":101.80696105957031,"ymin":791.3206176757813},"cat":"1","prob":0.9999427795410156},{"bbox":{"ymax":999.32958984375,"xmax":1922.61572265625,"xmin":1631.05029296875,"ymin":556.4727172851563},"cat":"1","prob":0.9999374151229858},{"bbox":{"ymax":1841.7998046875,"xmax":1153.1229248046875,"xmin":62.47994613647461,"ymin":206.24771118164063},"cat":"1","prob":0.9999256134033203},{"bbox":{"ymax":998.0281982421875,"xmax":1883.58251953125,"xmin":1478.87939453125,"ymin":698.8909912109375},"cat":"1","prob":0.9999082088470459},{"bbox":{"ymax":993.3325805664063,"xmax":1737.7374267578125,"xmin":1440.0362548828125,"ymin":554.254150390625},"cat":"1","prob":0.9999079704284668},{"bbox":{"ymax":1475.51318359375,"xmax":1372.742919921875,"xmin":0.0,"ymin":544.5368041992188},"cat":"1","prob":0.999876856803894},{"bbox":{"ymax":1104.88916015625,"xmax":915.4425659179688,"xmin":116.52717590332031,"ymin":509.49798583984377},"cat":"1","prob":0.9998511075973511},{"bbox":{"ymax":1186.427490234375,"xmax":1840.67041015625,"xmin":1298.7374267578125,"ymin":468.38287353515627},"cat":"1","prob":0.9994650483131409},{"bbox":{"ymax":1176.736083984375,"xmax":2111.431884765625,"xmin":1238.377197265625,"ymin":794.1444702148438},"cat":"1","prob":0.9903794527053833},{"bbox":{"ymax":1044.1197509765625,"xmax":2992.68603515625,"xmin":2149.134521484375,"ymin":686.8695068359375},"cat":"1","prob":0.8381036520004273},{"last":true,"bbox":{"ymax":1228.653076171875,"xmax":2776.582275390625,"xmin":1960.4600830078125,"ymin":899.2258911132813},"cat":"1","prob":0.6939422488212586}],"uri":"PATH_MODEL/test_image.jpeg"}]}}
We obtain way more predictions than what we should obtain.
I tried without setting an image mean in th request. We sill obtain different things with a lot of predictions.
I thought may be NMS was not working well however we do not get exactly the same bboxes here.
Hoping that you could help me with this.
Hi there!
this one was a loooong shot, but it should be okay with this : #770
A lot of thank for the very detail bug report, it helped a lot !
Hey @fantes,
I tried with your fix however I get the following error when I try to make a prediction
[2020-08-11 15:03:10.466] [api] [info] Running DeepDetect HTTP server on localhost:8080
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::GridAnchor_TRT
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::NMS_TRT
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::Reorg_TRT
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::Region_TRT
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::Clip_TRT
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::LReLU_TRT
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::PriorBox_TRT
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::Normalize_TRT
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::RPROI_TRT
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::BatchedNMS_TRT
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::FlattenConcat_TRT
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::CropAndResize
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::DetectionLayer_TRT
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::Proposal
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::ProposalLayer_TRT
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::PyramidROIAlign_TRT
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::ResizeNearest_TRT
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::Split
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::SpecialSlice_TRT
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::InstanceNormalization_TRT
[2020-08-11 15:11:18.192] [imageserv] [info] setting max workspace size to 1048576000
[2020-08-11 15:11:18.192] [imageserv] [info] setting max batch size to 6
[2020-08-11 15:11:18.991] [api] [info] 127.0.0.1 "PUT /services/imageserv" 201 799
dede: nmsPlugin.cpp:54: virtual nvinfer1::Dims nvinfer1::plugin::DetectionOutput::getOutputDimensions(int, const nvinfer1::Dims*, int): Assertion `nbInputDims == 3' failed.
Aborted (core dumped)
tensorrt-oss is successfully built and the version of the binaries used are correct. I do not know where that comes from :s
did you start from clean build tree ? build/tensorrt-oss should be completely removed before rebuilding things, it is not automatically / cleaned/patched/rebuilt
nmsPlugin.cpp is at build/tensorrt-oss/src/tessorrt-oss/plugin/nmsPlugin/nmsPlugin.cpp, could you check line 54? could you check the assertion at beginning of getOutputDimensions in nmsPlugin.cpp ? it should read ASSERT(nbInputDims == 3 || nbInputDims == 5);
if the source is patched and rebuilt correctly, then maybe it is the issue as before : link to wrong libnvinfer_plugin.so
Hey!
After a bit of struggle I finally succeeded in transforming a detection model. I got good results but still have some weird bboxes.
See the two first bboxes :O
{"status":{"code":200,"msg":"OK"},"head":{"method":"/predict","service":"imageserv","time":40653.0},"body":{"predictions":[{"classes":[{"bbox":{"ymax":-4440.0,"xmax":0.0,"xmin":0.0,"ymin":0.0},"cat":"0","prob":1.875},{"bbox":{"ymax":-2368.0,"xmax":0.0,"xmin":0.0,"ymin":0.0},"cat":"0","prob":1.0},{"bbox":{"ymax":1026.1483154296875,"xmax":1890.064453125,"xmin":1492.464111328125,"ymin":531.9712524414063},"cat":"1","prob":0.999995231628418},{"bbox":{"ymax":1498.189453125,"xmax":960.9320068359375,"xmin":201.79478454589845,"ymin":456.7858581542969},"cat":"1","prob":0.9999672174453735},{"last":true,"bbox":{"ymax":824.0353393554688,"xmax":2732.435791015625,"xmin":2390.67578125,"ymin":366.8230895996094},"cat":"1","prob":0.9999634027481079}],"uri":"PATH_MODEL/test_image.jpeg"}]}}
Btw I checked your MR and saw that you renamed the binary folder for tensorrt (was not working at the beginning because of this, needed to update my dockerfile).
However the modification is not everywhere L909 of the CMakelist you have
-DTRT_BIN_DIR=${CMAKE_BINARY_DIR}/tensorrt-oss/bin
But to be coherent with your modification it should be
-DTRT_BIN_DIR=${CMAKE_BINARY_DIR}/tensorrt-oss/src/tensorrt-oss-build
No ?
hi
you're right, indeed the way to define output binary dir has changed in TRT from TRT_BIN_DIR to TRT_OUT_DIR , i did not notice it.
I just fixed it, so why updated PR, the older behavior is restored, ie .so are in ${CMAKE_BINARY_DIR}/tensorrt-oss/bin , ie build/tensorrt-oss/bin (i guess you'll have to roll back get your docker file)
Could you give me a test case where bboxes are weird ? is it the same as above ?
thanks !
Hey, great!
It is the same test case as above!
Thank for the reply.
Yep I tried #770 However I got some empty bboxes as explaiend #769 (comment)
Hi @YaYaB , unfortunately, I cannot reproduce, after a compile from scratch with -DUSE_TENSORRT=ON -DUSE_TENSORRT_OSS=ON -DUSE_CAFFE=ON
while using your exact example, i get
{"status":{"code":200,"msg":"OK"},"head":{"method":"/predict","service":"imageserv","time":25366.0},"body":{"predictions":[{"classes":[{"bbox":{"ymax":1026.1483154296875,"xmax":1890.064453125,"xmin":1492.464111328125,"ymin":531.9712524414063},"cat":"1","prob":0.999995231628418},{"bbox":{"ymax":1498.189453125,"xmax":960.9320068359375,"xmin":201.79478454589845,"ymin":456.7858581542969},"cat":"1","prob":0.9999672174453735},{"last":true,"bbox":{"ymax":824.0353393554688,"xmax":2732.435791015625,"xmin":2390.67578125,"ymin":366.8230895996094},"cat":"1","prob":0.9999634027481079}],"uri":"/home/infantes/test_image.jpeg"}]}}
which seems correct (my commit is cb9df34)