Whisper.cpp Speech-to-Text engine combined with Silero Voice Activity Detector. This improves transcription speed and quality, and can avoid hallucination of the model.
Run whisper_vad.py
directly for transcribing any video/audio files into SRT subtitles, or import it as a library.
- ffmpeg (command)
- openblas (system library)
- cffi
- torch
- scipy
- zhconv: Chinese postprocess
pip install -r requirements.txt
make
python3 whisper_vad.py --help
to see usage.
This currently only supports CLBlast and AMD HIPBLAS.
Dependencies: libclblast
, OpenCL
Build: WHISPER_CLBLAST=1 make
Dependencies: libhipblas
, libamdhip64
, librocblas
Build:
- Build original Whisper.cpp
- Copy
ggml-cuda.o
towhisper_cpp
WHISPER_HIPBLAS=1 make