ufal / whisper_streaming

Whisper realtime streaming for long speech-to-text transcription and translation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Why the OnlineASRProcessor.commited is always incremented?

bianxg opened this issue · comments

commented

Why the OnlineASRProcessor.commited is always incremented (it includes all commited transcripts from the beginning)?
As time passes, the "commited" grows large. Should we truncate it ?

Hi,
thanks for feedback. This is an unresolved corner case. Yes, you can truncate it after this line

self.commited.extend(o)

, but there must remain 200 last characters. Or other size, I haven't experiment with the 200 here:

while p and l < 200: # 200 characters prompt size
, it can be tuned/improved.

Does it the current version without truncating harm the performance? Do you have memory issues? Or is it just annoying in the log?
After how long audio/how many words? Usually nobody runs one audio processing so long so that's an issue.

commented

Thanks. I just review the code and think it will be harmful to performance and memory. I think of its application in video conference.