ricky0123 / vad

Voice activity detector (VAD) for the browser with a simple API

Home Page:https://www.vad.ricky0123.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to handle the silence for a couple of seconds

LordHydra1 opened this issue · comments

Hey @ricky0123,

I'm currently exploring options to incorporate a delay into the onSpeechEnd function. Specifically, as I develop a voice chat application, I'm aiming to avoid immediate audio transmission during pauses by introducing a delay before triggering the onSpeechEnd function.

For example, I'm considering utilizing setTimeout to achieve this delay: (this is just to explain better what im saying I know is not possible doing like that but if there is a way to do something similar)

setTimeout(() => {
onSpeechEnd: (audio) => {
console.log("Speech end");
const wavBuffer = utils.encodeWAV(audio);
console.log("test", wavBuffer);
const base64 = utils.arrayBufferToBase64(wavBuffer);
const url = data:audio/wav;base64,${base64};
setAudioList((old) => [url, ...old]);
},
}, 2000);

Alternatively, I request to you the possibility of implementing a delay that given a number execute onSpeechEnd function after the setted delay.

const vad = useMicVAD({
delay: 3000, // Triggers onSpeechEnd after a 3-second pause (when you stop talking)
workletURL: "/static/js/vad.worklet.bundle.min.js",
modelURL: "/static/js/silero_vad.onnx",
onVADMisfire: () => console.log("VAD misfire"),
onSpeechStart: () => console.log("Speech start"),
onSpeechEnd: (audio) => {
console.log("Speech end");
const wavBuffer = utils.encodeWAV(audio);
console.log("test", wavBuffer);
const base64 = utils.arrayBufferToBase64(wavBuffer);
const url = data:audio/wav;base64,${base64};
const post = makePost(wavBuffer);
console.log(post);
setAudioList((old) => [url, ...old]);
},
onError: (e) => setError(e.message),
});

These approaches offer flexibility in managing delays and enhancing the user experience during voice interactions.
I'm open to hear if there is already an approach and if im blind to not seeing it and implement it (if there is could you provide a little snippet for that?)
Thanks for the amazing work and I hope for a quick answer <3.

Hi @LordHydra1 , do you mean that you would like the voice activity algorithm to wait for a period of time at the end of speech to see if the user resumes speaking after a short pause? If so, you are looking for redemptionFrames. You can see the relevant documentation here. I believe the default value is 8 (8 frames, each frame being 1536 samples of audio or ~100 milliseconds of audio), you can try increasing it