ufal / whisper_streaming

Whisper realtime streaming for long speech-to-text transcription and translation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

VAD and whisper-timestamped

Jeronymous opened this issue · comments

First, thank you. I am super happy to see whisper-timestamped used in such a good project.
Having Whipser streamed in real time is a super feature!

I see here that VAD is not available when using whisper-timestamped backend:

def use_vad(self):
raise NotImplemented("Feature use_vad is not implemented for whisper_timestamped backend.")

But VAD IS implemented in whisper-timestamped (it was even before faster-whisper integrated it). It's currently based on SILERO (same as what was done in faster-whisper).
Am I missing a sticking point? (Maybe the fact that things required for VAD are not by default in the requirements?)
I can contribute if help is needed on this.

(VAD is important to prevent some hallucinations of Whisper models, and make timestamps more accurate)

Also, I want to mention:
After being disappointed with weird results on some files, I opened a branch to replace SILERO with AUDITOK : linto-ai/whisper-timestamped#78 (see the linked issue to have an illustration of possible "hallucinations" of Silero).
I had good experience with Auditok. I was hoping some user feedback to confirm before merging in master. But as it's not coming, maybe we just need to establish a benchmark to confirm the improvement.

Hi, thanks for feedback.
Yes, I know that VAD is in whisper_timestamped. I put NotImplemented because I primarily use and focus on faster-whisper backend. Feel free to implement it -- it should be easy, passing parameter to a function, analogically to

self.transcribe_kargs["vad_filter"] = True

SILERO vs AUDITOK is a topic for another issue. I don't have feedback.

but I realized that VAD is now used ineffectively. In every update it's processed on the whole buffer. It could be used to cut silence out of the buffer, so that next update is faster. This could be improved

SILERO vs AUDITOK is a topic for another issue. I don't have feedback.

@Jeronymous , please open an issue about this, if you'll have a test results to share