chengsokdara / use-whisper

React hook for OpenAI Whisper with speech recorder, real-time transcription, and silence removal built-in

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Using streaming + onTranscribe (custom server) together?

sgrove opened this issue · comments

Very impressed by this project, thank you so much for it!

Is there some way to be able to stream the audio to a server endpoint (as in the examples) but also have it iteratively return results? Right now it seems like if streaming: true is set, it will only hit the whisper api directly from the frontend (e.g. https://api.openai.com/v1/audio/transcriptions).

That means there's quite a long pause at the end of recording to getting the result (since ffmpeg has to run at the end, and then upload quite a large file before getting the transcription). I'm curious if there's a way to avoid that with the current design?

Hello @sgrove

  • if you want to send streaming audio to your own server, you can use onDataAvailable
  • when you pass streaming = true, onDataAvailable will be called in interval based on timeSlice.
const streamToServer = (blob) => {
  // send chunk of audio to your server
  // the implementation will be on your own, you can check the source code at onDataAvailable to see more
}

const { transcript } = useWhisper({ streaming: true, onDataAvailable: streamToServer })
  • ffmpeg only convert if you pass removeSilence = true

  • in case of streaming = true, ffmpeg won't do anything

  • for streaming I tried to send chunk of audio to be transcribe one by one but the translation is not good, so currently it concat audio blobs in succession and send to Whisper.

  • I am still finding a better way to do streaming, if you got better idea it is very welcome.

@chengsokdara Great project, Is it possible for you to create an example in nextjs of this working whilst only exposing the open api keys to the server?

@haluvibe yes I will add that later when this is a bit more stable, currently trying to make it truly cross-browser.

Hello @chengsokdara ,

Thanks for this best lib!

I have already setup my custom server that gives me the response after the user stop speaking the request is made and I have the response.

So the user have to wait for it.

I can see your comment to add onDataAvailable to stream the response.

But in your code at

setTranscript((prev) => ({ ...prev, text }))

you set the transcript, so if we add our own we just have to return the text response from our custom server?

If you have a template or snipper what we exaclty can do to have streaming + onTranscribe (custom server) it will be great for us.

Thanks

It seems the PR here #33 can allow streaming + onTranscribe (custom server) to work together