ufal / whisper_streaming

Whisper realtime streaming for long speech-to-text transcription and translation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

batching multi-client server

Gldkslfmsd opened this issue · comments

          > > How to use this to allow multiple clients to connect when you host a server or create an API for live transcription?

I don't know, it's a topic that requires a separate issue. But first, there must be a Whisper backend that enables batching -- more inputs processing at once. If there's not, then use one GPU with one server for one client.

Thank you. Using one GPU for each client is a tall ask for me as there could be up to a dozen clients active at a particular time for my use case. I think there are a few backends which do support batched processing. e.g. https://github.com/Blair-Johnson/batch-whisper
If you have any references or you can point me to the parts where changes are needed to implement this.
Or is it alright if I create a new issue for this?

Originally posted by @umaryasin33 in #10 (comment)

I also found this fast batching whisper backend: https://github.com/Vaibhavs10/insanely-fast-whisper

you can point me to the parts where changes are needed to implement this.

First, you need a multi-client server. It handles each client the same way as single client, but it needs a new subclass of ASRBase that would connect through API to a batching backend. Maybe the API could be shared with #34 ?

And then you need the Whisper batching backend and API -- I don't know which way is optimal, a subprocess, network API, etc.

From code policy point of view, make a new entry point for the multi-client server. I suggest a separate project which would use Whisper-Streaming as a module. I could not be available to maintain it in this repo.

but more projects could use this feature, like https://github.com/ufal/correctable-lecture-translator . Open-sourcing and collaboration is welcome!