jolibrain / deepdetect

Deep Learning API and Server in C++14 support for Caffe, PyTorch,TensorRT, Dlib, NCNN, Tensorflow, XGBoost and TSNE

Home Page:https://www.deepdetect.com/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Different predictions with Tensorrt7 for a refinedet model

YaYaB opened this issue · comments

commented

Configuration

  • Version of DeepDetect:
    • Locally compiled on:
      • Ubuntu 14.04 LTS
      • Ubuntu 18.04 LTS
      • Mac OSX
      • Other:
    • Docker
    • Amazon AMI
  • Commit (shown by the server when starting):
    0eac8a4

Your question / the problem you're facing:

I successfully installed a version of DD with tensorrt backend and tensorrt-oss.
A refineddet model is correctly loaded however the results obtained are very very different than what we obtain with the original model in caffe.

The following steps will help reproduce:

Download the model

PATH_MODEL="PATH_WANTED"
mkdir $PATH_MODEL && cd $PATH_MODEL
wget https://deepdetect.com/models/init/desktop/images/detection/faces_512.tar.gz
wget -O test_image.jpeg https://miro.medium.com/max/6528/1*DYUaxku5bfbZaDLW-4SyWg.jpeg
tar -xvf faces_512.tar.gz

Error message (if any) / steps to reproduce the problem:

Let us first use the caffe version of this model
Load the model

  • list of API calls:
curl -X PUT "http://localhost:8080/services/imageserv" -d '{
"mllib":"caffe",
"description":"image classification service",
"type":"supervised",
"parameters":{
    "input":{
    "connector":"image",
    "width":512,
    "height":512
    },
    "mllib": {
       "nclasses": 2
    }
},
"model":{
    "repository":"PATH_MODEL"
}
}'
  • Server log output:
[2020-08-10 12:15:47.549] [imageserv] [info] Using pre-trained weights from PATH_MODEL/model_iter_20000.caffemodel
[2020-08-10 12:15:47.648] [caffe] [info] Ignoring source layer label_data_1_split
[2020-08-10 12:15:47.670] [caffe] [info] Ignoring source layer arm_loc_ftune_arm_loc_0_split
[2020-08-10 12:15:47.670] [caffe] [info] Ignoring source layer arm_conf_ftune_arm_conf_0_split
[2020-08-10 12:15:47.670] [caffe] [info] Ignoring source layer arm_priorbox_arm_priorbox_0_split
[2020-08-10 12:15:47.671] [caffe] [info] Ignoring source layer arm_loss
[2020-08-10 12:15:47.671] [caffe] [info] Ignoring source layer odm_loss
[2020-08-10 12:15:47.675] [imageserv] [info] Net total flops=95632720896 / total params=33904320
[2020-08-10 12:15:47.675] [imageserv] [info] detected network type is detection
[2020-08-10 12:15:47.675] [api] [info] 127.0.0.1 "PUT /services/imageserv" 201 3867

Make a prediction

  • list of API calls:
curl -X POST "http://localhost:8080/predict" -d '{
      "service":"imageserv",
      "parameters":{
        "input":{
          "width":512,
          "height":512
        },
        "output":{
          "confidence_threshold": 0.5,
          "bbox": true
        }
      },
      "data": ["PAHT_MODEL/test_image.jpeg"]
    }'

  • Server log output:
{"status":{"code":200,"msg":"OK"},"head":{"method":"/predict","service":"imageserv","time":4203.0},"body":{"predictions":[{"classes":[{"bbox":{"ymax":1026.1483154296875,"xmax":1890.0640869140625,"xmin":1492.4638671875,"ymin":531.9712524414063},"cat":"1","prob":0.999995231628418},{"bbox":{"ymax":1498.1898193359375,"xmax":960.9320678710938,"xmin":201.794677734375,"ymin":456.78582763671877},"cat":"1","prob":0.9999672174453735},{"last":true,"bbox":{"ymax":824.03515625,"xmax":2732.435791015625,"xmin":2390.67578125,"ymin":366.8230285644531},"cat":"1","prob":0.9999634027481079}],"uri":"PATH_MODEL/test_image.jpeg"}]}}

We obtain 3 faces as expected (see the image here).

Now if we do the same using the tensorrt version of the model.
Load the model

  • list of API calls:
curl -X PUT "http://localhost:8080/services/imageserv" -d '{
"mllib":"tensorrt",
"description":"image classification service",
"type":"supervised",
"parameters":{
    "input":{
    "connector":"image",
    "width":512,
    "height":512,
    "mean": [104.146, 110.808, 119.856]
    },
    "mllib": {
       "datatype": "fp32",
       "nclasses": 1,
       "maxBatchSize": 6,
       "maxWorkspaceSize": 1000,
       "gpuid":0
    }
},
"model":{
    "repository":"PATH_MODEL"
}
}'
  • Server log output:
DeepDetect [ commit 0eac8a49404292529a6ea1810290af800092902c ]
[2020-08-10 12:19:38.712] [api] [info] Running DeepDetect HTTP server on localhost:8080
[2020-08-10 12:19:40.624] [imageserv] [info] Plugin Creator registration succeeded - GridAnchor_TRT
[2020-08-10 12:19:40.624] [imageserv] [info] Plugin Creator registration succeeded - NMS_TRT
[2020-08-10 12:19:40.624] [imageserv] [info] Plugin Creator registration succeeded - Reorg_TRT
[2020-08-10 12:19:40.624] [imageserv] [info] Plugin Creator registration succeeded - Region_TRT
[2020-08-10 12:19:40.624] [imageserv] [info] Plugin Creator registration succeeded - PriorBox_TRT
[2020-08-10 12:19:40.624] [imageserv] [info] Plugin Creator registration succeeded - Normalize_TRT
[2020-08-10 12:19:40.624] [imageserv] [info] Plugin Creator registration succeeded - RPROI_TRT
[2020-08-10 12:19:40.624] [imageserv] [info] Plugin Creator registration succeeded - BatchedNMS_TRT
[2020-08-10 12:19:40.624] [imageserv] [info] Plugin Creator registration succeeded - FlattenConcat_TRT
[2020-08-10 12:19:40.624] [imageserv] [info] Plugin Creator registration succeeded - CropAndResize
[2020-08-10 12:19:40.624] [imageserv] [info] Plugin Creator registration succeeded - Proposal
[2020-08-10 12:19:40.624] [imageserv] [info] Plugin Creator registration succeeded - BatchTilePlugin_TRT
[2020-08-10 12:19:40.624] [imageserv] [info] Plugin Creator registration succeeded - DetectionLayer_TRT
[2020-08-10 12:19:40.624] [imageserv] [info] Plugin Creator registration succeeded - ProposalLayer_TRT
[2020-08-10 12:19:40.624] [imageserv] [info] Plugin Creator registration succeeded - PyramidROIAlign_TRT
[2020-08-10 12:19:40.624] [imageserv] [info] Plugin Creator registration succeeded - ResizeNearest_TRT
[2020-08-10 12:19:40.624] [imageserv] [info] Plugin Creator registration succeeded - SpecialSlice_TRT
[2020-08-10 12:19:40.624] [imageserv] [info] Plugin Creator registration succeeded - InstanceNormalization_TRT
[2020-08-10 12:19:40.624] [imageserv] [info] setting max workspace size to 1048576000
[2020-08-10 12:19:40.624] [imageserv] [info] setting max batch size to 6

Make predictions

  • list of API calls:
curl -X POST "http://localhost:8080/predict" -d '{
      "service":"imageserv",
      "parameters":{
        "input":{
          "width":512,
          "height":512
        },
        "output":{
          "confidence_threshold": 0.5,
          "bbox": true
        }
      },
      "data": ["PATH_MODEL/test_image.jpeg"]
    }'
  • Server log output:
{"status":{"code":200,"msg":"OK"},"head":{"method":"/predict","service":"imageserv","time":86.0},"body":{"predictions":[{"classes":[{"bbox":{"ymax":1008.7504272460938,"xmax":2232.509033203125,"xmin":1251.8868408203125,"ymin":585.7403564453125},"cat":"1","prob":0.9999964237213135},{"bbox":{"ymax":1362.1265869140625,"xmax":1992.9556884765625,"xmin":1462.7811279296875,"ymin":576.5375366210938},"cat":"1","prob":0.999995231628418},{"bbox":{"ymax":843.9384155273438,"xmax":3020.7353515625,"xmin":2155.833740234375,"ymin":443.4128112792969},"cat":"1","prob":0.9999885559082031},{"bbox":{"ymax":1063.1495361328125,"xmax":2013.5091552734375,"xmin":1463.1455078125,"ymin":273.2212829589844},"cat":"1","prob":0.9999780654907227},{"bbox":{"ymax":1522.9658203125,"xmax":995.2101440429688,"xmin":404.69158935546877,"ymin":661.9135131835938},"cat":"1","prob":0.9999672174453735},{"bbox":{"ymax":1052.681884765625,"xmax":2815.238525390625,"xmin":2289.733642578125,"ymin":333.2514953613281},"cat":"1","prob":0.9999667406082153},{"bbox":{"ymax":836.4298706054688,"xmax":2637.939697265625,"xmin":2355.98193359375,"ymin":408.5758972167969},"cat":"1","prob":0.9999634027481079},{"bbox":{"ymax":851.178955078125,"xmax":1890.22509765625,"xmin":1474.4056396484375,"ymin":556.6243896484375},"cat":"1","prob":0.9999524354934692},{"bbox":{"ymax":847.6719970703125,"xmax":2792.68505859375,"xmin":2395.373291015625,"ymin":536.9264526367188},"cat":"1","prob":0.9999485015869141},{"bbox":{"ymax":765.6693725585938,"xmax":2743.524658203125,"xmin":2457.97216796875,"ymin":343.8699645996094},"cat":"1","prob":0.9999476671218872},{"bbox":{"ymax":1399.032470703125,"xmax":912.3306884765625,"xmin":101.80696105957031,"ymin":791.3206176757813},"cat":"1","prob":0.9999427795410156},{"bbox":{"ymax":999.32958984375,"xmax":1922.61572265625,"xmin":1631.05029296875,"ymin":556.4727172851563},"cat":"1","prob":0.9999374151229858},{"bbox":{"ymax":1841.7998046875,"xmax":1153.1229248046875,"xmin":62.47994613647461,"ymin":206.24771118164063},"cat":"1","prob":0.9999256134033203},{"bbox":{"ymax":998.0281982421875,"xmax":1883.58251953125,"xmin":1478.87939453125,"ymin":698.8909912109375},"cat":"1","prob":0.9999082088470459},{"bbox":{"ymax":993.3325805664063,"xmax":1737.7374267578125,"xmin":1440.0362548828125,"ymin":554.254150390625},"cat":"1","prob":0.9999079704284668},{"bbox":{"ymax":1475.51318359375,"xmax":1372.742919921875,"xmin":0.0,"ymin":544.5368041992188},"cat":"1","prob":0.999876856803894},{"bbox":{"ymax":1104.88916015625,"xmax":915.4425659179688,"xmin":116.52717590332031,"ymin":509.49798583984377},"cat":"1","prob":0.9998511075973511},{"bbox":{"ymax":1186.427490234375,"xmax":1840.67041015625,"xmin":1298.7374267578125,"ymin":468.38287353515627},"cat":"1","prob":0.9994650483131409},{"bbox":{"ymax":1176.736083984375,"xmax":2111.431884765625,"xmin":1238.377197265625,"ymin":794.1444702148438},"cat":"1","prob":0.9903794527053833},{"bbox":{"ymax":1044.1197509765625,"xmax":2992.68603515625,"xmin":2149.134521484375,"ymin":686.8695068359375},"cat":"1","prob":0.8381036520004273},{"last":true,"bbox":{"ymax":1228.653076171875,"xmax":2776.582275390625,"xmin":1960.4600830078125,"ymin":899.2258911132813},"cat":"1","prob":0.6939422488212586}],"uri":"PATH_MODEL/test_image.jpeg"}]}}

We obtain way more predictions than what we should obtain.
I tried without setting an image mean in th request. We sill obtain different things with a lot of predictions.

I thought may be NMS was not working well however we do not get exactly the same bboxes here.

Hoping that you could help me with this.

Hi there!
this one was a loooong shot, but it should be okay with this : #770

A lot of thank for the very detail bug report, it helped a lot !

commented

Hey @fantes,
I tried with your fix however I get the following error when I try to make a prediction

[2020-08-11 15:03:10.466] [api] [info] Running DeepDetect HTTP server on localhost:8080
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::GridAnchor_TRT
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::NMS_TRT
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::Reorg_TRT
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::Region_TRT
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::Clip_TRT
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::LReLU_TRT
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::PriorBox_TRT
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::Normalize_TRT
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::RPROI_TRT
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::BatchedNMS_TRT
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::FlattenConcat_TRT
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::CropAndResize
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::DetectionLayer_TRT
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::Proposal
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::ProposalLayer_TRT
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::PyramidROIAlign_TRT
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::ResizeNearest_TRT
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::Split
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::SpecialSlice_TRT
[2020-08-11 15:11:18.192] [imageserv] [info] Plugin creator registration succeeded - ::InstanceNormalization_TRT
[2020-08-11 15:11:18.192] [imageserv] [info] setting max workspace size to 1048576000
[2020-08-11 15:11:18.192] [imageserv] [info] setting max batch size to 6
[2020-08-11 15:11:18.991] [api] [info] 127.0.0.1 "PUT /services/imageserv" 201 799
dede: nmsPlugin.cpp:54: virtual nvinfer1::Dims nvinfer1::plugin::DetectionOutput::getOutputDimensions(int, const nvinfer1::Dims*, int): Assertion `nbInputDims == 3' failed.
Aborted (core dumped)

tensorrt-oss is successfully built and the version of the binaries used are correct. I do not know where that comes from :s

did you start from clean build tree ? build/tensorrt-oss should be completely removed before rebuilding things, it is not automatically / cleaned/patched/rebuilt

nmsPlugin.cpp is at build/tensorrt-oss/src/tessorrt-oss/plugin/nmsPlugin/nmsPlugin.cpp, could you check line 54? could you check the assertion at beginning of getOutputDimensions in nmsPlugin.cpp ? it should read ASSERT(nbInputDims == 3 || nbInputDims == 5);

if the source is patched and rebuilt correctly, then maybe it is the issue as before : link to wrong libnvinfer_plugin.so

commented

Hey!
After a bit of struggle I finally succeeded in transforming a detection model. I got good results but still have some weird bboxes.
See the two first bboxes :O

{"status":{"code":200,"msg":"OK"},"head":{"method":"/predict","service":"imageserv","time":40653.0},"body":{"predictions":[{"classes":[{"bbox":{"ymax":-4440.0,"xmax":0.0,"xmin":0.0,"ymin":0.0},"cat":"0","prob":1.875},{"bbox":{"ymax":-2368.0,"xmax":0.0,"xmin":0.0,"ymin":0.0},"cat":"0","prob":1.0},{"bbox":{"ymax":1026.1483154296875,"xmax":1890.064453125,"xmin":1492.464111328125,"ymin":531.9712524414063},"cat":"1","prob":0.999995231628418},{"bbox":{"ymax":1498.189453125,"xmax":960.9320068359375,"xmin":201.79478454589845,"ymin":456.7858581542969},"cat":"1","prob":0.9999672174453735},{"last":true,"bbox":{"ymax":824.0353393554688,"xmax":2732.435791015625,"xmin":2390.67578125,"ymin":366.8230895996094},"cat":"1","prob":0.9999634027481079}],"uri":"PATH_MODEL/test_image.jpeg"}]}}

Btw I checked your MR and saw that you renamed the binary folder for tensorrt (was not working at the beginning because of this, needed to update my dockerfile).
However the modification is not everywhere L909 of the CMakelist you have

      -DTRT_BIN_DIR=${CMAKE_BINARY_DIR}/tensorrt-oss/bin

But to be coherent with your modification it should be

      -DTRT_BIN_DIR=${CMAKE_BINARY_DIR}/tensorrt-oss/src/tensorrt-oss-build

No ?

hi

you're right, indeed the way to define output binary dir has changed in TRT from TRT_BIN_DIR to TRT_OUT_DIR , i did not notice it.
I just fixed it, so why updated PR, the older behavior is restored, ie .so are in ${CMAKE_BINARY_DIR}/tensorrt-oss/bin , ie build/tensorrt-oss/bin (i guess you'll have to roll back get your docker file)

Could you give me a test case where bboxes are weird ? is it the same as above ?

thanks !

commented

Hey, great!
It is the same test case as above!

commented

Hey @fantes Any update on this?

Have you looked at #770 ? @fantes is away for the next 15 days.

commented

Thank for the reply.
Yep I tried #770 However I got some empty bboxes as explaiend #769 (comment)

Hi @YaYaB , unfortunately, I cannot reproduce, after a compile from scratch with -DUSE_TENSORRT=ON -DUSE_TENSORRT_OSS=ON -DUSE_CAFFE=ON while using your exact example, i get

{"status":{"code":200,"msg":"OK"},"head":{"method":"/predict","service":"imageserv","time":25366.0},"body":{"predictions":[{"classes":[{"bbox":{"ymax":1026.1483154296875,"xmax":1890.064453125,"xmin":1492.464111328125,"ymin":531.9712524414063},"cat":"1","prob":0.999995231628418},{"bbox":{"ymax":1498.189453125,"xmax":960.9320068359375,"xmin":201.79478454589845,"ymin":456.7858581542969},"cat":"1","prob":0.9999672174453735},{"last":true,"bbox":{"ymax":824.0353393554688,"xmax":2732.435791015625,"xmin":2390.67578125,"ymin":366.8230895996094},"cat":"1","prob":0.9999634027481079}],"uri":"/home/infantes/test_image.jpeg"}]}}

which seems correct (my commit is cb9df34)

commented

@fantes I've tried again building from scratch with the specific commit and now it works.
Thanks!