SthPhoenix / InsightFace-REST

InsightFace REST API for easy deployment of face recognition services with TensorRT in Docker.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

does force onnx drop GPU support?

MyraBaba opened this issue · comments

Hi,

When I force to use onnx I saw model is not using gpu. using only cpu. can it be configurable ?

Docker images are built with CPU version of onnxruntime. It's intended use case is a fallback when no GPU available.
You can install onnxruntime-gpu, though in it's latest versions you also have to provide GPU executor provider as argument to onnxruntime.Session.
But I highly recommend using TRT for GPU since it's faster than onnxruntime and also there are some optimizations for image preprocessing with TRT backend

is there any speed / accuracy difference between trt and onnx ?

It would be good if we change trt to onnx in deploy.sh (in gpu version) it automatically run gpu (onnxruntime-gpu).

is there any speed / accuracy difference between trt and onnx ?

It would be good if we change trt to onnx in deploy.sh (in gpu version) it automatically run gpu (onnxruntime-gpu).

TRT is significantly faster especially with fp16 inference (force_fp16=True) on GPUs supporting it. There is some accuracy degradation, but embeddings computed with TRT and onnxruntime are usually 0.99 similar.

It's trivial to add support for onnxruntime-gpu, but I'm not sure if it's actually useful, since TRT performs much better, and as I said before there are optimizations in IFR code for TRT inference.

You should also provide cuda execution provider argument in latest versions of onnxruntime

To all lines with onnxruntime.InferenceSession in onnxrt_backend.py

That's pretty slow, what GPU and models parameter have you used?

Try enabling force_fp16 then, I'm getting around 145-150 img/sec with one worker and 10 client threads with fp16 enabled on rtx2080 super for Stallone.jpg