Python command line utility wrappers for Whispercpp and other speech-to-text utilities
This is mainly a set of useful scripts to automate Whispercpp processing, including:
- Automatic conversion of any video or audio format
ffmpeg
supports to the WAV format Whispercpp needs
- You need to have a working executable version of whisper.cpp,
you can either place that in the root of this repo as
whispercpp
or give the path using the-w
flag - Place your models in the
models
folder. By defaultaudio2text.py
will look forggml-large.bin
in the models folder ffmpeg
is required and should be somewhere in your$PATH
- You might want to make a virtual environment and then install the
requirements.txt
, e.g.
python -m venv .env
source .env/bin/activate
pip install -U pip
pip install -r requirements.txt
./audio2text.py
To convert the given berliner.ogg
file in the test directory to a CSV file
./audio2text.py -i test/berliner.ogg -o test/berliner -of csv
Converts SRT files to JSON, CSV and TXT using dataknead.
./srtparse.py -i test/berliner.srt -o test/berliner.csv
If you add the -v
(verbose) flag audio2text
will give much more debug information.
You'll get this when doing audio2text.py -h
usage: audio2text.py [-h] [-di] [-i INPUT] [-l LANGUAGE] [-m MODEL_PATH] [-o OUTPUT] [-of {txt,vtt,srt,csv,words}] [-su] [-v]
[-w WHISPER_PATH]
options:
-h, --help show this help message and exit
-di, --diarize Diarize audio (only works for natural stereo audio)
-i INPUT, --input INPUT
File to parse
-l LANGUAGE, --language LANGUAGE
-m MODEL_PATH, --model-path MODEL_PATH
Path to model
-o OUTPUT, --output OUTPUT
-of {txt,vtt,srt,csv,words}, --output-format {txt,vtt,srt,csv,words}
-su, --speed-up
-v, --verbose
-w WHISPER_PATH, --whisper-path WHISPER_PATH
MIT © Hay Kranen