SthPhoenix / InsightFace-REST

InsightFace REST API for easy deployment of face recognition services with TensorRT in Docker.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Best qaulity result without speed / performance concern

MyraBaba opened this issue · comments

What is the best for detection (both accuracy and good landmark )and recognition model if there is no speed concern ? It could be run slower but produce highest quality results.

Such as RetFace-50 very good but old one. little bit slower . Is SCRFD superior than the retinaFace ? Specially feeding the recognition models ?

Is there any past experience or theoretical information ?

Best

From my observations scrfd_10g_bnkps is better in term of lower false positive detections rate and better recall, though it produces worse landmarks for faces rotated at 90 and more degrees. Retinaface gives inaccurate landmarks in those conditions too, but they are mostly closer to expected.
Secondly bnkps model doesn't work well for large faces, it mostly misses them. gnkps model was retrained to fix this issue, but it has a bit lower recall and landmarks quality.
Thought I must admit faces missed by gnkps model are mostly useless for recognition, since they are of very low quality.

If you are not concerned about false detections and speed - Retinaface is still most universal detector IMO.

As for recognition I am mostly using glintr100 as most accurate from published models.
BUT, there is also Webface based model w600k_r50 from official package, which is almost twice faster then glintr100 and have better accuracy on some benchmarks as reported by authors. I have tested it for clustering and it seem to produce clusters almost identical as glintr100 model, but I can't tell you if it's really better or not.

@SthPhoenix Thanks for the reply. So best for accurate recognition is retinaface landmarks (R50).

have you tried yolo5l ?

Best

@SthPhoenix Thanks for the reply. So best for accurate recognition is retinaface landmarks (R50).

It'll work better in extreme cases, but in general scrfd models should work pretty the same.

have you tried yolo5l ?

I have cloned the repo weeks ago, but still can't get some time to test it. It seems promising though.

@SthPhoenix What is your preferred metrics for glintr100 threshold according to your experience ?

and the paper said yolo5 landmarks better than the retinaFace so is it means producing better accuracy ? paper

finally what is your practical preference for arcface or cosface ? or distance make you satisfied

I'm using (1. + np.dot(A, B)) / 2. as similarity metric, which gives score in 0.0-1.0 range.
From my experience scores above 0.8 are exact matches.
0.78-0.8 might contain false positive, though for really similar people and quite rarely.
0.75-0.78 contains more false positives though still rarely.
Everything below 0.75 could contain many false positives.

Also take note that true positives might be anywhere in range 0.6-1.0, poor image quality or side images may greatly decrease score, as well as comparing lot of poor images might give you lots of false positives with high scores.

@SthPhoenix so you are using similarity metric instead of distance ?

Yes, I found it more human readable and convenient for display.

@SthPhoenix
I am testing lfw for glintr100.onx. as you know many small images.
Is there any way to index as batches to increase speed. ? GPU utilization not more than %15 .

You can tweak batch sizes in deploy_trt.sh.
Also you can set multiple workers, though you should tweak your lfw code to support multithreading.

@SthPhoenix

Funny thing that genderModel says Queen_Elizabeth_II is Male :)

have you tested ?

0
1
2
3
4

I always knew she has some secrets )
I'm not using gender age detection anywhere, this model was added just for compatibility with original lib. In my tests it shown bad accuracy too.

Have you tested IR-152 ? Looks better than the IR-50

Could you please provide link to the model you are talking about?

I've downloaded it, but still some more info is needed ) What dataset was used for training? Are there any performance metrics available?
I haven't seen any info about ir-152 based models in deepinsight repo.

@SthPhoenix is this one or glint100 is better according to your field practice

@MyraBaba unfortunately I can't check your model in near future, but just from benchmarks perspective they seems to be pretty the same.
Though I must admit that for later models it seems that classical LFW, CFP, AgeDB, etc. benchmarks may be not enough representative.
Also in many use cases slight accuracy gains may be neglected by slower inference of IR-152 backbone.

BTW, I have tested yolov5-face models, it seems they give a bit better landmarks in hard cases, though I haven't performed any speed tests yet.
I think if I'll be able to run those models at speeds comparable with at least RetinaFace, I'll add them to this repo.

@MyraBaba , I have added support for yolov5-facemodels, you can check it )

looking now :) @SthPhoenix

Did you convert to onnx ? I didn’t se the model where is it :)

It's covered to onnx and will be automatically downloaded when you switch detector to, for example, yolov5s-face in deploy-trt.sh

@SthPhoenix
I am trying to add mqt an option for your project. Still on it.

Meanwhile I would love to hear your experience regarding to realtime face recognition from video feed .
detection faces 25/fps means 25 faces every second for one person. If there is 3 person it means 3x25 = 75 faces per second has to be feature extracted and search through db. But there is 3 person.

what would be the best performant solution to lower faces number to send to recognition server for process ?

lowering fps could cause the miss the desired face chip or on the detection process using the low feature (12) to understan if the still same person and not to send to recognition server ?

What would be your suggestion ? Consider there will be 10 camera and a lot of faces coming { ie : 3 person per camera and 25 fps = 750 faces per second to hand } . Meanwhile recognition server re-detect face for better alignment .

Would love to hear your opinion and suggestion

Hi! I have no much experience in such scenario, as I have noted somewhere else you could try combining detection\feature extraction with some kind of tracker algo, like FastMOT. Also if you scenario expects people entering room\building it might be good idea to limit face size and\or tune detection area to exclude unwanted by-passers in background.

PS: Wow, this question was posted on Jun 11, totally missed it.