Whisper VAD

Whisper.cpp Speech-to-Text engine combined with Silero Voice Activity Detector. This improves transcription speed and quality, and can avoid hallucination of the model.

Run whisper_vad.py directly for transcribing any video/audio files into SRT subtitles, or import it as a library.

Dependencies

ffmpeg (command)
openblas (system library)
cffi
torch
scipy
zhconv: Chinese postprocess

Build and usage

pip install -r requirements.txt
make
python3 whisper_vad.py --help to see usage.

GPU

This currently only supports CLBlast and AMD HIPBLAS.

CLBlast

Dependencies: libclblast, OpenCL

Build: WHISPER_CLBLAST=1 make

HIPBLAS

Dependencies: libhipblas, libamdhip64, librocblas

Build:

Build original Whisper.cpp
Copy ggml-cuda.o to whisper_cpp
WHISPER_HIPBLAS=1 make

About

Whisper.cpp Speech-to-text with Voice Acticity Detection

speech-to-text whisper whisper-cpp

MIT License

Languages

Language:C 40.1%Language:C++ 33.0%Language:Cuda 13.2%Language:Metal 8.1%Language:Objective-C 4.3%Language:Python 1.3%Language:Makefile 0.0%