gumblex / whisper_vad

Whisper.cpp Speech-to-text with Voice Acticity Detection

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Whisper VAD

Whisper.cpp Speech-to-Text engine combined with Silero Voice Activity Detector. This improves transcription speed and quality, and can avoid hallucination of the model.

Run whisper_vad.py directly for transcribing any video/audio files into SRT subtitles, or import it as a library.

Dependencies

  • ffmpeg (command)
  • openblas (system library)
  • cffi
  • torch
  • scipy
  • zhconv: Chinese postprocess

Build and usage

  1. pip install -r requirements.txt
  2. make
  3. python3 whisper_vad.py --help to see usage.

GPU

This currently only supports CLBlast and AMD HIPBLAS.

CLBlast

Dependencies: libclblast, OpenCL

Build: WHISPER_CLBLAST=1 make

HIPBLAS

Dependencies: libhipblas, libamdhip64, librocblas

Build:

  1. Build original Whisper.cpp
  2. Copy ggml-cuda.o to whisper_cpp
  3. WHISPER_HIPBLAS=1 make

About

Whisper.cpp Speech-to-text with Voice Acticity Detection

License:MIT License


Languages

Language:C 40.1%Language:C++ 33.0%Language:Cuda 13.2%Language:Metal 8.1%Language:Objective-C 4.3%Language:Python 1.3%Language:Makefile 0.0%