SthPhoenix / InsightFace-REST

InsightFace REST API for easy deployment of face recognition services with TensorRT in Docker.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Triton Backend

dbasbabasi opened this issue · comments

Hi, It works on trt backend. I am trying to run it on the triton backend. I changed docker parameter in the deploy_trt file. It fails on warmup on triton backend. Do I need change another conf?

Hi! Triton backend should work, but for now it's up to you to run separate Triton Server container and provide it's url to deploy_trt.sh config.

Also, currently there is known issue with inference of SCRFD model with Triton backend - Triton provides outputs as non writable numpy array, but new optimized SCRFD post processing modifies net output arrays in place to avoid excessive creation of numpy arrays. For now it can be fixed by replacing lines 332-334 of scrfd.py with:

score_blob = np.copy(net_outs[idx][0])
bbox_blob =  np.copy(net_outs[idx + self.fmc][0])
kpss_blob =  np.copy(net_outs[idx + self.fmc * 2][0])

image

Thank you for your quick response.

Actually I have some experience with Triton but there is a problem with getting the metadata during load the model and docker container is stopping automatically. I tried to debug it. But I couldn't fix it. I used following models and conf.

max_size=640,640
det_model=retinaface_r50_v1
rec_model=arcface_r100_v1

Docker logs:
image

Have you changed localhost to actual triton server IP:grpc port?
In docker localhost is container itself, not the host machine.

deploy_trt.txt

Here is my deploy_trt file. Yeah I tried with host ip and also tried with localhost and open the port on docker run comment.

You shouldn't bind triton ports inside insightface-rest container, it should cause exceptions when starting triton server, or IFR container

Yeah I got it. I deleted ports, run inference docker after that run the deploy_trt. It looks detection model uploaded and I can see the model output list during the load but I got another error for Arcface. I am checking it. Thank you so much for your help.

image

IFR is using shared GPU memory to communicate with triton server, it may not work if triton is on different host.

Yeah it works on the same machine. I could send a face detection request to Triton. But when I tried to load face rec model, it is returning Cuda shared memory error.

Also I needed to change face detection request dimension for fixing it.

I have just checked - everything seems to be working using fix from #60 (comment)
I have followed these steps:

  1. Run deploy_trt.sh setting rec_batch_size = 32 and det_batch_size = 10
  2. Wait until trt engines are built
  3. Stop IRF container
  4. Copy engines to Triton server models folder under following paths: {triton_models}/scrfd_10g_gnkps/1/model.plan, {triton_models}/glintr100/1/model.plan
  5. Run Triton server, ensure it actually have started.
  6. Edit deploy_trt.sh changing det_batch_size to 1 and INFERENCE_BACKEND to triton and providing valid triton_uri (your host machine local IP address)
  7. Run deploy_trt.sh again.
  8. Now you IFR container should be using Triton inference server.

Though you should provide valid model configs to get use of dynamic batching.

Also keep in mind that creating shared memory regions actually uses additional GPU memory (about 110-150mb per worker), so ensure you have enough free GPU RAM

Thank you so much I used onnx model for triton. It works right now for retinaface and arcface. Do you have a plan adding to age gender for triton?

Gender/age model is now temporarily not supported, since g/a model requires different face crop preprocessing than current glintr100 recognition models.

I used retinaface resnet model for face detection. I will try to run g/a model. Thank you so much for your help. If you have a recommendation for g/a, I will be really glad, otherwise I will close this issue.

You could implement it, but you'll have to make copies of face crops numpy arrays at recognition step, otherwise g/a estimations will be totally wrong, due to different preprocessing required for recognition and g/a estimation.
Copying numpy arrays will hit overall performance, though I haven't tested how much yet.

Thank you I used my own model for that as onnx. And write new client for this models. The result looks good. Your repo is awesome. Thank you so much for your help!

Nice to hear that!
Have you used publicly available model for ga or have you trained your own?

I used my own trained models. I converted them to the onnx and write a new client for age/gender, emotion and mask detection. After the face crop, passed the cropped face to the inference. I see retina face had pretrained mask model but it looks unavailable right now.

I used my own trained models. I converted them to the onnx and write a new client for age/gender, emotion and mask detection. After the face crop, passed the cropped face to the inference. I see retina face had pretrained mask model but it looks unavailable right now.

Sorry for late reply, finally got some free time )

You have separate models for GA, emotion and mask detection working on 112x112 face crops?
That's interesting since all pretrained models for this tasks I have seen were expecting different input shape.
Could you point out where I could find training code or models if you have used public repos?

Hey Yeah GA model is separated. It is not public repo I can't share it. Our models are works with retinaface. I have no idea about public GA and mask models