Question: Is there a callback function that fires when the positive audio segment becomes larger than minSpeechFrames?

Question

Question: Is there a callback function that fires when the positive audio segment becomes larger than minSpeechFrames?

jin60641 opened this issue 7 months ago · comments

jinsang commented 7 months ago

when misfire (audio segment is smaller than minSpeechFrames)

onSpeechStart
onVADMisfire

when fire (audio segment is bigger than minSpeechFrames)

onSpeechStart
onSpeechRealStart (This callback function fires when the audio segment(greater than positiveSpeechThreshold) becomes larger than minSpeechFrames)
onSpeechEnd

I need the onSpeechRealStart(Provisional name) function, but I can't find it in the documentation.
If it exists, please explain it.
I can implement the function using a userSpeaking flag and a timer based on milliseconds, but it doesn't mean that the audio segment(greater than positiveSpeechThreshold) is larger than minSpeechFrames.
Thanks!

8trees · Answer 1 · Tue Dec 19 2023 16:23:33 GMT+0800 (China Standard Time)

Currently, there is no such function🥲
Could you please explain in more detail why you want that?

jinsang · Answer 2 · Tue Dec 19 2023 16:40:44 GMT+0800 (China Standard Time)

Thanks for reply
I'd like to perform a high-cost computation with the audio segment onSpeechEnd, and if the user creates a new audio segment, I want to immediately cancel the computation and perform it again with the new audio segment. If there was an onSpeechRealStart event, I could cancel the computation even sooner and save on cost.
(onSpeechStart occurs regardless of minSpeechFrames and can happen before onVADMisfire is called, so it cannot be used as a criterion for cancellation.)

8trees · Answer 3 · Tue Dec 19 2023 18:53:26 GMT+0800 (China Standard Time)

Thanks. It sounds reasonable to me to add onSpeechRealStart.
Another solution is that you can setTimeout in onSpeechStart with a bit longer time than minSpeechFrames and clearTimeout in onVadMisfire (Is this whay you are doing now?)

@ricky0123 What do you think?

jinsang · Answer 4 · Tue Dec 19 2023 20:46:22 GMT+0800 (China Standard Time)

@HayatoYagi That's right. As I wrote in the issue, I used setTimeout to check if the userSpeaking flag remains true, and solved the problem circumventively. However, since this doesn't exactly match minSpeechFrames, I created the issue.
If there is an onSpeechRealStart, it would be more accurate and convenient!

Ricky Samore · Answer 5 · Wed Dec 20 2023 01:52:06 GMT+0800 (China Standard Time)

Hi @jin60641, thanks for your comments, this is a very interesting request that I had never thought of. I can see the utility of an onSpeechRealStart callback. Just to clarify, though, minSpeechFrames refers to the minimum number of frames that should have speech probability greater than positiveSpeechThreshold. So, onSpeechRealStart won't always fire minSpeechFrames after the first frame with speech is detected. We need to continuously tally the number of frames with speech probability greater than positiveSpeechThreshold and fire onSpeechRealStart after that tally exceeds minSpeechFrames. I just wanted to clarify that in case anyone starts implementing it.

jinsang · Answer 6 · Wed Dec 20 2023 08:46:46 GMT+0800 (China Standard Time)

@ricky0123
Thanks!
Your explanation matches my intention for onSpeechRealStart.
The positiveSpeechThreshold is already mentioned in the documentation, so I didn't specifically mention it here. It's correct that it's not just when the size of the audio segment is larger than minSpeechFrames, but rather when the size of the audio segment that satisfies the positiveSpeechThreshold is larger than minSpeechFrames!