Triton Backend

Question

Triton Backend

dbasbabasi opened this issue 3 years ago · comments

Hi, It works on trt backend. I am trying to run it on the triton backend. I changed docker parameter in the deploy_trt file. It fails on warmup on triton backend. Do I need change another conf?

SthPhoenix · Answer 1 · Thu Nov 18 2021 22:23:10 GMT+0800 (China Standard Time)

Hi! Triton backend should work, but for now it's up to you to run separate Triton Server container and provide it's url to deploy_trt.sh config.

Also, currently there is known issue with inference of SCRFD model with Triton backend - Triton provides outputs as non writable numpy array, but new optimized SCRFD post processing modifies net output arrays in place to avoid excessive creation of numpy arrays. For now it can be fixed by replacing lines 332-334 of scrfd.py with:

score_blob = np.copy(net_outs[idx][0])
bbox_blob =  np.copy(net_outs[idx + self.fmc][0])
kpss_blob =  np.copy(net_outs[idx + self.fmc * 2][0])

dbasbabasi · Answer 2 · Fri Nov 19 2021 03:42:42 GMT+0800 (China Standard Time)

Thank you for your quick response.

Actually I have some experience with Triton but there is a problem with getting the metadata during load the model and docker container is stopping automatically. I tried to debug it. But I couldn't fix it. I used following models and conf.

max_size=640,640
det_model=retinaface_r50_v1
rec_model=arcface_r100_v1

Docker logs:

SthPhoenix · Answer 3 · Fri Nov 19 2021 04:01:24 GMT+0800 (China Standard Time)

Have you changed localhost to actual triton server IP:grpc port?
In docker localhost is container itself, not the host machine.

dbasbabasi · Answer 4 · Fri Nov 19 2021 04:12:19 GMT+0800 (China Standard Time)

deploy_trt.txt

Here is my deploy_trt file. Yeah I tried with host ip and also tried with localhost and open the port on docker run comment.

SthPhoenix · Answer 5 · Fri Nov 19 2021 04:38:30 GMT+0800 (China Standard Time)

You shouldn't bind triton ports inside insightface-rest container, it should cause exceptions when starting triton server, or IFR container

dbasbabasi · Answer 6 · Fri Nov 19 2021 04:55:53 GMT+0800 (China Standard Time)

Yeah I got it. I deleted ports, run inference docker after that run the deploy_trt. It looks detection model uploaded and I can see the model output list during the load but I got another error for Arcface. I am checking it. Thank you so much for your help.

SthPhoenix · Answer 7 · Fri Nov 19 2021 05:06:18 GMT+0800 (China Standard Time)

IFR is using shared GPU memory to communicate with triton server, it may not work if triton is on different host.

dbasbabasi · Answer 8 · Mon Nov 22 2021 19:50:49 GMT+0800 (China Standard Time)

Yeah it works on the same machine. I could send a face detection request to Triton. But when I tried to load face rec model, it is returning Cuda shared memory error.

Also I needed to change face detection request dimension for fixing it.

SthPhoenix · Answer 9 · Mon Nov 22 2021 22:45:07 GMT+0800 (China Standard Time)

I have just checked - everything seems to be working using fix from #60 (comment)
I have followed these steps:

Run deploy_trt.sh setting rec_batch_size = 32 and det_batch_size = 10
Wait until trt engines are built
Stop IRF container
Copy engines to Triton server models folder under following paths: {triton_models}/scrfd_10g_gnkps/1/model.plan, {triton_models}/glintr100/1/model.plan
Run Triton server, ensure it actually have started.
Edit deploy_trt.sh changing det_batch_size to 1 and INFERENCE_BACKEND to triton and providing valid triton_uri (your host machine local IP address)
Run deploy_trt.sh again.
Now you IFR container should be using Triton inference server.

Though you should provide valid model configs to get use of dynamic batching.

Also keep in mind that creating shared memory regions actually uses additional GPU memory (about 110-150mb per worker), so ensure you have enough free GPU RAM

dbasbabasi · Answer 10 · Tue Nov 23 2021 18:53:05 GMT+0800 (China Standard Time)

Thank you so much I used onnx model for triton. It works right now for retinaface and arcface. Do you have a plan adding to age gender for triton?

SthPhoenix · Answer 11 · Tue Nov 23 2021 19:25:00 GMT+0800 (China Standard Time)

Gender/age model is now temporarily not supported, since g/a model requires different face crop preprocessing than current glintr100 recognition models.

dbasbabasi · Answer 12 · Thu Nov 25 2021 00:58:27 GMT+0800 (China Standard Time)

I used retinaface resnet model for face detection. I will try to run g/a model. Thank you so much for your help. If you have a recommendation for g/a, I will be really glad, otherwise I will close this issue.

SthPhoenix · Answer 13 · Wed Dec 01 2021 03:25:28 GMT+0800 (China Standard Time)

You could implement it, but you'll have to make copies of face crops numpy arrays at recognition step, otherwise g/a estimations will be totally wrong, due to different preprocessing required for recognition and g/a estimation.
Copying numpy arrays will hit overall performance, though I haven't tested how much yet.

dbasbabasi · Answer 14 · Thu Dec 09 2021 01:11:03 GMT+0800 (China Standard Time)

Thank you I used my own model for that as onnx. And write new client for this models. The result looks good. Your repo is awesome. Thank you so much for your help!

SthPhoenix · Answer 15 · Thu Dec 09 2021 01:46:41 GMT+0800 (China Standard Time)

Nice to hear that!
Have you used publicly available model for ga or have you trained your own?

dbasbabasi · Answer 16 · Sat Dec 11 2021 08:49:53 GMT+0800 (China Standard Time)

I used my own trained models. I converted them to the onnx and write a new client for age/gender, emotion and mask detection. After the face crop, passed the cropped face to the inference. I see retina face had pretrained mask model but it looks unavailable right now.

SthPhoenix · Answer 17 · Thu Dec 30 2021 03:21:37 GMT+0800 (China Standard Time)

I used my own trained models. I converted them to the onnx and write a new client for age/gender, emotion and mask detection. After the face crop, passed the cropped face to the inference. I see retina face had pretrained mask model but it looks unavailable right now.

Sorry for late reply, finally got some free time )

You have separate models for GA, emotion and mask detection working on 112x112 face crops?
That's interesting since all pretrained models for this tasks I have seen were expecting different input shape.
Could you point out where I could find training code or models if you have used public repos?

dbasbabasi · Answer 18 · Thu Jan 06 2022 18:47:12 GMT+0800 (China Standard Time)

Hey Yeah GA model is separated. It is not public repo I can't share it. Our models are works with retinaface. I have no idea about public GA and mask models