Is this realtime?

Question

Is this realtime?

fire17 opened this issue 2 months ago · comments

Hi there
First of all amazing project!
was wondering what is the expected latency for a short audio (2-5 seconds)?
Is it instant? Less then a second?

Wondering if this can be used in a realtime local ai voice/video conversations
anything over a second is not usable in realtime user-facing applications
but could be good for other plenty of other cases

Let me know, and also it would be nice if the answer was seperated to cloud gpus, and local consumer gpus (for local use)
Thanks and all the best!

puffy310 · Answer 1 · Tue Jun 25 2024 16:53:38 GMT+0800 (China Standard Time)

10 minutes for 5 seconds of audio… definitely not real time, I hope latency is improved.

Xupeng (Tony) Tong · Answer 2 · Sun Jul 07 2024 18:45:18 GMT+0800 (China Standard Time)

10 minutes for 5 seconds of audio… definitely not real time, I hope latency is improved.

Which GPU is being used?

AmoMTL · Answer 3 · Fri Aug 09 2024 00:08:27 GMT+0800 (China Standard Time)

Can it be realtime if a more powerful gpu is used? @puffy310 were you running inference on your local machine?

puffy310 · Answer 4 · Fri Aug 09 2024 00:14:01 GMT+0800 (China Standard Time)

I was not using a local GPU but using L4 Rented on HF with https://huggingface.co/spaces/fudan-generative-ai/hallo it is still early tech but I have not checked for a month so inference time may have improved significantly.

puffy310 · Answer 5 · Fri Aug 09 2024 00:16:48 GMT+0800 (China Standard Time)

In theory anything can be ran in real time with powerful enough hardware, I do not know the threshold for GPUs to run this at 8 or 12FPS. It's likely 8xH100 wouldn't even get close. Maybe someone from Fudan can give some more insight.