ricky0123 / vad

Voice activity detector (VAD) for the browser with a simple API

Home Page:https://www.vad.ricky0123.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question: Is there a callback function that fires when the positive audio segment becomes larger than minSpeechFrames?

jin60641 opened this issue · comments

when misfire (audio segment is smaller than minSpeechFrames)

  1. onSpeechStart
  2. onVADMisfire

when fire (audio segment is bigger than minSpeechFrames)

  1. onSpeechStart
  2. onSpeechRealStart (This callback function fires when the audio segment(greater than positiveSpeechThreshold) becomes larger than minSpeechFrames)
  3. onSpeechEnd

I need the onSpeechRealStart(Provisional name) function, but I can't find it in the documentation.
If it exists, please explain it.
I can implement the function using a userSpeaking flag and a timer based on milliseconds, but it doesn't mean that the audio segment(greater than positiveSpeechThreshold) is larger than minSpeechFrames.
Thanks!

Currently, there is no such function🥲
Could you please explain in more detail why you want that?

Thanks for reply
I'd like to perform a high-cost computation with the audio segment onSpeechEnd, and if the user creates a new audio segment, I want to immediately cancel the computation and perform it again with the new audio segment. If there was an onSpeechRealStart event, I could cancel the computation even sooner and save on cost.
(onSpeechStart occurs regardless of minSpeechFrames and can happen before onVADMisfire is called, so it cannot be used as a criterion for cancellation.)

Thanks. It sounds reasonable to me to add onSpeechRealStart.
Another solution is that you can setTimeout in onSpeechStart with a bit longer time than minSpeechFrames and clearTimeout in onVadMisfire (Is this whay you are doing now?)

@ricky0123 What do you think?

@HayatoYagi That's right. As I wrote in the issue, I used setTimeout to check if the userSpeaking flag remains true, and solved the problem circumventively. However, since this doesn't exactly match minSpeechFrames, I created the issue.
If there is an onSpeechRealStart, it would be more accurate and convenient!

Hi @jin60641, thanks for your comments, this is a very interesting request that I had never thought of. I can see the utility of an onSpeechRealStart callback. Just to clarify, though, minSpeechFrames refers to the minimum number of frames that should have speech probability greater than positiveSpeechThreshold. So, onSpeechRealStart won't always fire minSpeechFrames after the first frame with speech is detected. We need to continuously tally the number of frames with speech probability greater than positiveSpeechThreshold and fire onSpeechRealStart after that tally exceeds minSpeechFrames. I just wanted to clarify that in case anyone starts implementing it.

@ricky0123
Thanks!
Your explanation matches my intention for onSpeechRealStart.
The positiveSpeechThreshold is already mentioned in the documentation, so I didn't specifically mention it here. It's correct that it's not just when the size of the audio segment is larger than minSpeechFrames, but rather when the size of the audio segment that satisfies the positiveSpeechThreshold is larger than minSpeechFrames!