ricky0123 / vad

Voice activity detector (VAD) for the browser with a simple API

Home Page:https://www.vad.ricky0123.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Speech after a particular interval

IrfanAli17899 opened this issue · comments

Hi, first of all thank you so much for this awesome package, its working great for me, the only thing that i am wondering if i can get the speech after each 5 seconds instead of pause, because right now if user continuously speaks so it doesn't give us results and this shows a kind of latency in the UI. please help thanks in advance.

Thanks for your proposal! Your idea is nice, but it has a problem that the audio can be splitted in the middle of a word.
I'm not sure it should be implemented.

When I wanted to make each speech segment shorter, I tried reducing redemptionFrames and increasing negativeSpeechThreshold.

Hi @IrfanAli17899, I'm glad this package has been useful for you. So in other words, throughout a continuous speech period, you would like to have a callback that runs on a regular 5 second interval and takes as an argument the current raw audio of the speech segment? Can I ask what kind of UI updates you are referring to? I would like to understand the use case better.

Hi @ricky0123 yes you are right, actually the audio i am getting from your package, i am feeding that audio to gpt for transcription and translation and then i show those results on the frontend, i am trying to make a real time translator, the problem is the library doesn't provide audio segments untill user don't stop speaking, i want a smooth audio segment on a regular interval so that if user contineously speaks without stopping then i could still show the transcription and translation results. do you get it? let me know if you need more explanation of the use case, thanks much.

Hi @IrfanAli17899, thanks for the clarification. Have you considered streaming audio from the browser to your server and doing all of the audio processing there, instead of using this package?

Potentially what we could do is provide a method on the vad object that allows you to get the current audio segment. That would allow you to experiment by creating a timer that queries the current audio and sends it to your server.

yeah i tried the browser media recorder api to stream audio to the server, but as the first chunk is playable because it has all the necessary headers rest of the chunks are not so i had to merge all the chunks on the backend and then crop the last 5 seconds for the transcription so it was a very lengthy hectic solution that is why i tried your package.

yes it will be very helpful if there is a prop which takes the callback function and also another prop for interval and it can provide me chunks but each chunk should be playable itself i guess, then it will be useful for me. let me know what do you think, thanks @ricky0123

Hi @IrfanAli17899 what I'm saying is that we probably won't add a callback that runs on an interval, but I would be open to adding a method for you to get the raw audio at any given time, so you could do something like

myvad = await vad.MicVAD.new(...)
mytimer = createTimer(myIntervalLength, function() {
    const audio = myvad.getCurrentAudio()
    // send audio to server, etc
})

This would be easy to implement and more general. I'm not sure if the method you're describing of sending audio to your server on an interval will work, but this would at least allow you to try it out.

Hi, pardon @ricky0123 will that audio be vad model processed?

Hi @IrfanAli17899 what I'm saying is that we probably won't add a callback that runs on an interval, but I would be open to adding a method for you to get the raw audio at any given time, so you could do something like

myvad = await vad.MicVAD.new(...)
mytimer = createTimer(myIntervalLength, function() {
    const audio = myvad.getCurrentAudio()
    // send audio to server, etc
})

This would be easy to implement and more general. I'm not sure if the method you're describing of sending audio to your server on an interval will work, but this would at least allow you to try it out.

@ricky0123 Hello Ricky, thanks for the great package, is the above feature implemented in the package? I am looking for the same thing to do a real time stream of audio to the server!!

to @ricky0123 Hi, is it done?