How to handle the silence for a couple of seconds
LordHydra1 opened this issue · comments
Hey @ricky0123,
I'm currently exploring options to incorporate a delay into the onSpeechEnd
function. Specifically, as I develop a voice chat application, I'm aiming to avoid immediate audio transmission during pauses by introducing a delay before triggering the onSpeechEnd
function.
For example, I'm considering utilizing setTimeout
to achieve this delay: (this is just to explain better what im saying I know is not possible doing like that but if there is a way to do something similar)
setTimeout(() => {
onSpeechEnd: (audio) => {
console.log("Speech end");
const wavBuffer = utils.encodeWAV(audio);
console.log("test", wavBuffer);
const base64 = utils.arrayBufferToBase64(wavBuffer);
const url = data:audio/wav;base64,${base64}
;
setAudioList((old) => [url, ...old]);
},
}, 2000);
Alternatively, I request to you the possibility of implementing a delay that given a number execute onSpeechEnd
function after the setted delay.
const vad = useMicVAD({
delay: 3000, // Triggers onSpeechEnd after a 3-second pause (when you stop talking)
workletURL: "/static/js/vad.worklet.bundle.min.js",
modelURL: "/static/js/silero_vad.onnx",
onVADMisfire: () => console.log("VAD misfire"),
onSpeechStart: () => console.log("Speech start"),
onSpeechEnd: (audio) => {
console.log("Speech end");
const wavBuffer = utils.encodeWAV(audio);
console.log("test", wavBuffer);
const base64 = utils.arrayBufferToBase64(wavBuffer);
const url = data:audio/wav;base64,${base64}
;
const post = makePost(wavBuffer);
console.log(post);
setAudioList((old) => [url, ...old]);
},
onError: (e) => setError(e.message),
});
These approaches offer flexibility in managing delays and enhancing the user experience during voice interactions.
I'm open to hear if there is already an approach and if im blind to not seeing it and implement it (if there is could you provide a little snippet for that?)
Thanks for the amazing work and I hope for a quick answer <3.
Hi @LordHydra1 , do you mean that you would like the voice activity algorithm to wait for a period of time at the end of speech to see if the user resumes speaking after a short pause? If so, you are looking for redemptionFrames
. You can see the relevant documentation here. I believe the default value is 8 (8 frames, each frame being 1536 samples of audio or ~100 milliseconds of audio), you can try increasing it