I stop receiving predictions while streaming after some time

Question

I stop receiving predictions while streaming after some time

jayavanth opened this issue 7 months ago · comments

I'm running

 python whisper_online_server.py --model base.en --host localhost --port 43001 --vad
arecord -f S16_LE -c1 -r 16000 -t raw -D default | nc localhost 43001

and the client stops receiving predictions after 50-70s. If I restart the client it starts working again. I noticed that this happens more often with audio that have frequent silences. Some audios that have background noises as "silence" did okay.

Also wondering how I can get word level timestamping. Is there an option for that? Because I'm currently getting something like this

20530 21390  Just saying, you know what?
24230 24770  before anything, because
24770 24830  for
24830 25770  anything because we've handled business
35130 36750  Well, I mean, the most
36770 37650  recent two rounds at
37650 38970  NIP's been able to put on the board have
38970 40210  been the result of some
40210 41190  form of aggression for
41190 42090  CI and the push into
42090 43250  halls to actually try and fight
43250 43550  out the
43550 45470  balconies at this point. And then now
45850 47010  here trying to be aggressive with
47010 48130  a boost off at the half wall.

Thank you for this library 🙌

Dominik Macháček · Answer 1 · Tue Jan 02 2024 16:32:37 GMT+0800 (China Standard Time)

Hi, are you using --vad ?

Dominik Macháček · Answer 2 · Tue Jan 02 2024 16:37:54 GMT+0800 (China Standard Time)

Also wondering how I can get word level timestamping. Is there an option for that? Because I'm currently getting something like this

you can print the word-level timestamps. Override or rewrite this function: https://github.com/ufal/whisper_streaming/blob/c236a9984f7e71465eb04a63b5545198fce1c8eb/whisper_online.py#L412C4-L425C23

Dominik Macháček · Answer 3 · Tue Jan 02 2024 22:12:02 GMT+0800 (China Standard Time)

Hi, are you using --vad ?

Yes, I noticed your command.

You're using base.en model, this one is probably badly performing and has outages like you report. Use bigger one for better quality.

Good luck!