go-whisper golang bindings
djthorpe opened this issue · comments
Create bindings for https://github.com/ggerganov/whisper.cpp
- Simple golang bindings with tests
- Some examples (main, sample) based off of these
- Integrate with ffmpeg for audio conversion
- Some sort of real-time translation
- gRPC and/or websocket API
- Docker image of a speech-to-tech service
Made PR:
ggerganov/whisper.cpp#269
Great work!
Keen on realtime translation and a way of calling out/streaming the output to another app - gRPC seems the best option for this
Yeah thanks.
I'm doing the audio downsampling to 16KHz at the moment in a different repository (go-media)
The realtime transcription and translation should be pretty straightforward, but pretty experimental, even for whisper.cpp
I will take a while to get to the gPRC microservice :-(
Added a "stream" command for the start of real-time streaming, but:
- Thread safety: Needs some work to ensure the same model can be used in the process method across threads/goroutines
- Ring buffer: Implement a ring buffer for continious audio samples
- Overlaps: Need some word overlaps to ensure we don't lose words between sample windows
- Silence: Don't process audio when silence is fed in. Ideally chunk windows when there is a largish (>1s) silence
There's also some issues with the segmenting in the main package (repeated segments come out!) needs fixing.