djthorpe / go-whisper

Speech-to-Text in golang

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

go-whisper golang bindings

djthorpe opened this issue · comments

Create bindings for https://github.com/ggerganov/whisper.cpp

  • Simple golang bindings with tests
  • Some examples (main, sample) based off of these
  • Integrate with ffmpeg for audio conversion
  • Some sort of real-time translation
  • gRPC and/or websocket API
  • Docker image of a speech-to-tech service

Great work!

Keen on realtime translation and a way of calling out/streaming the output to another app - gRPC seems the best option for this

Yeah thanks.

I'm doing the audio downsampling to 16KHz at the moment in a different repository (go-media)

The realtime transcription and translation should be pretty straightforward, but pretty experimental, even for whisper.cpp

I will take a while to get to the gPRC microservice :-(

Added a "stream" command for the start of real-time streaming, but:

  • Thread safety: Needs some work to ensure the same model can be used in the process method across threads/goroutines
  • Ring buffer: Implement a ring buffer for continious audio samples
  • Overlaps: Need some word overlaps to ensure we don't lose words between sample windows
  • Silence: Don't process audio when silence is fed in. Ideally chunk windows when there is a largish (>1s) silence

There's also some issues with the segmenting in the main package (repeated segments come out!) needs fixing.