benchmark and mqt

Question

benchmark and mqt

MyraBaba opened this issue 2 years ago · comments

MyraBaba commented 2 years ago

Hi, @SthPhoenix

Do you have any benchmark result for demo_client for 2080 RTX or similar ?

what would be the 1000 foto per second hw config ?

Best

SthPhoenix · Answer 1 · Mon Jun 13 2022 06:08:30 GMT+0800 (China Standard Time)

Hi! I'll check benchmarks with latest version, but if I recall correctly you can achieve speeds around 1000 im/sec on rtx2080 only with fastest and lower quality models and small images, since larger images requires more time for decoding.
Also resulting speed greatly depends on number of faces per image and target size, eg 608x608 is slightly faster than default 640x640 at cost of some detection accuracy. Though in some cases it can be set to even lower values like 320x320

MyraBaba · Answer 2 · Mon Jun 13 2022 06:44:16 GMT+0800 (China Standard Time)

image will be 112x112 and pre aligned . just face for recognition.

6-10 detection client send face (112x112 aligned) for recognition. This is the spec. What would expect ?

mqt could be the fast communication layer

SthPhoenix · Answer 3 · Mon Jun 13 2022 15:09:46 GMT+0800 (China Standard Time)

Well, in this scenario with w600k_r50 model and batch_size=64 you could achieve up to 2k faces/sec if you will use 2 rtx2080 GPU setup one for detection one for recognition.
Your mqtt client should be able to group incoming faces in batches of up 64 faces to achieve maximum performance.

MyraBaba · Answer 4 · Mon Jun 13 2022 19:13:38 GMT+0800 (China Standard Time)

how is the accuracy of the w600k_r50 regarding to glintr100 ?

if we use glintr100 is the performance (faces/sec) degraded 2x ?

waiting to 64 foto could be problem if only one person on the scene 64 foto ~ 3 sec. so there will be 3 sec delay min.

so we need to send realtime one by one . and get the result in below 0,5 sec something like that.

we can send 112x112 aligned face to the recognition server only using cpu we can handle it.

mqt or zeromq looks fastest comm layer. tcp and rest to much baggage/handshake etc. imho

SthPhoenix · Answer 5 · Mon Jun 13 2022 19:44:58 GMT+0800 (China Standard Time)

Accuracy is pretty good, I'd say its mostly performs better.

Yes, glintr100 is approx. 2x slower.

You can set time threshold for example 0.1s for waiting for batch, if there are less images in this period, send all you've got for the moment.

Yes, REST sure has some overhead, if you are good with mqtt or zeromq it should be easy to replace rest with queue consumer, though REST is more simple for many developers, so it's default interface for this project )

MyraBaba · Answer 6 · Sun Aug 14 2022 22:43:24 GMT+0800 (China Standard Time)

@SthPhoenix

I will inform the mqtt/0mq benchmark.

what will be the best practical approach to open the information screen with facial recognition;

Detection always on , but when the people recognized in front of the kiosk we should keep the session/situation instead of again and again recognize the same person and the start again. when the person leave the kiosk start full process again.

Also keep the cpu overhead minimum. recognization on the server / detection on the edge(kiosk)
Informational kiosk working with facialRec.

Is the tracking best approach or what would be ?

SthPhoenix · Answer 7 · Mon Aug 15 2022 03:41:16 GMT+0800 (China Standard Time)

Hi! I haven't worked in this direction yet, you could try looking at FastMOT object tracker, which seems a good fit for this task.

Alternatively you could try use faster recognition like w600k_mbf on the edge with something like faiss to locally find duplicate faces and send only unique ones to the heavier net on the server side.

Though it looks to me that both variants would require quiet a lot of additional logic, it would be very interesting to look at your implementation once you finish it.

MyraBaba · Answer 8 · Tue Aug 16 2022 20:03:17 GMT+0800 (China Standard Time)

@SthPhoenix thanks . One more thing regarding to search feature in the current vector Db:
what will be the best/fast/economical way to search 100GB feature vecktor db ? 👍
1-C++ / CPU
2 - GPU ?
3 -FPGA

SthPhoenix · Answer 9 · Sun Aug 28 2022 05:40:40 GMT+0800 (China Standard Time)

You can try checking FAISS as base for vector search, it can work on both CPU and GPU.
CPU is most practical and cheap, though if you need really fast search GPU is better.

In FAISS you can use PQ or SQ for dimensionality reduction, which can help lower memory footprint at cost of some accuracy. SQ8 for example can reduce 100gb to approx 25gb with almost no precision penalties. PQ can reduce size much better, though you might have to spend much time tuning parameters to achieve high precision.