Utilities for
- Transcribing a set of audio files with Speech to Text (STT)
- Analyzing the error rate of the STT transcription against a known-good transcription
Check out this Medium article to learn more how to use it, including a YouTube video demonstration https://medium.com/@marconoel/new-python-scripts-to-measure-word-error-rate-on-watson-speech-to-text-77ecaa513f60
Requires Python 3.x installation.
All of the watson-stt-wer-python dependencies are installed at once with pip
:
pip install -r requirements.txt
Note: If receiving an SSL Certificate error (CERTIFICATE_VERIFY_FAILED) when running the python scripts, try the following commands to tell python to use the system certificate store.
Windows
pip install --trusted-host pypi.org --trustedhost files.python.org python-certifi-win32
MacOS
Open a terminal and change to the location of your python installation to execute Install Certificates.command
, for example:
cd /Applications/Python 3.6
./Install Certificates.command
Create a copy of config.ini.sample
. You'll modify this file in subsequent steps.
cp config.ini.sample config.ini
Each sub-sections will describe what configuration parameters are needed.
Uses IBM Watson Speech to Text service to transcribe a folder full of audio files. Creates a CSV with transcriptions.
Update the parameters in your config.ini
file.
Required configuration parameters:
- apikey - API key for your Speech to Text instance
- service_url - Reference URL for your Speech to Text instance
- base_model_name - Base model for Speech to Text transcription
Optional configuration parameters:
- language_model_id - Language model customization ID (comment out to use base model)
- acoustic_model_id - Acoustic model customization ID (comment out to use base model)
- grammar_name - Grammar name (comment out to use base model)
- stt_transcriptions_file - Output file for Speech to Text transcriptions
- audio_file_folder - Input directory containing your audio files
- reference_transcriptions_file - Reference file for manually transcribed audio files ("labeled data" or "ground truth"). If present, will be merged into
stt_transcriptions_file
as "Reference" column - stemming - If True, pre-processing stems words with Porter stemmer. Stemming will treat singular/plural of a word as equivalent, rather than a word error.
Assuming your configuration is in config.ini
, transcribe all the audio files in audio_file_folder
parameter via the following command:
python transcribe.py config.ini
Transcription will be stored in a CSV file based on stt_transcriptions_file
parameter with a format like below:
Audio File | Transcription |
---|---|
file1.wav | The quick brown fox |
file2.wav | jumped over the lazy dog |
A third column, "Reference", will be included with the reference transcription, if a reference_transcriptions_file
is found as source.
Simple python package to approximate the Word Error Rate (WER), Match Error Rate (MER), Word Information Lost (WIL) and Word Information Preserved (WIP) of one or more transcripts.
Your config file must have references for the reference_transcriptions_file
and stt_transcriptions_file
properties.
- Reference file (
reference_transcriptions_file
) is a CSV file with at least columns calledAudio File Name
andReference
. TheReference
is the actual transcription of the audio file (also known as the "ground truth" or "labeled data"). NOTE: In your audio file name, make sure you put the full path (eg. ./audio1.wav) - Hypothesis file (
stt_transcriptions_file
) is a CSV file with at least columns calledAudio File Name
andHypothesis
. TheHypothesis
is the transcription of the audio file by the Speech to Text engine. Thetranscribe.py
script can create this file.
python analyze.py config.ini
Use the experiment.py script to execute a series of Transcription/Analyze experiments where configuration settings may change for each experiment. This option will require customization to set up for the specific configuration to be tested. Changes should be made in the run_all_experiments function.
python experiment.py config.ini
The script creates two output files, in the file names specified by the details_file
and summary_file
properties.
- Details (
details_file
) is a CSV file with rows for each audio sample, including reference and hypothesis transcription and specific transcription errors - Summary (
summary_file
) is a JSON file with metrics for total transcriptions and overall word and sentence error rates.
- WER (word error rate), commonly used in ASR assessment, measures the cost of restoring the output word sequence to the original input sequence.
- MER (match error rate) is the proportion of I/O word matches which are errors.
- WIL (word information lost) is a simple approximation to the proportion of word information lost which overcomes the problems associated with the RIL (relative information lost) measure that was proposed half a century ago.
Repo of the Python module JIWER: https://pypi.org/project/jiwer/
It computes the minimum-edit distance between the ground-truth sentence and the hypothesis sentence of a speech-to-text API. The minimum-edit distance is calculated using the python C module python-Levenshtein.
The models.py
script has wrappers for many model-related tasks including creating models, updating training contents, getting model details, and training models.
Update the parameters in your config.ini
file.
Required configuration parameters:
- apikey - API key for your Speech to Text instance
- service_url - Reference URL for your Speech to Text instance
- base_model_name - Base model for Speech to Text transcription
For general help, execute:
python models.py
The script requires a type (one of base_model,custom_model,corpus,word,grammar) and an operation (one of list,get,create,update,delete)
The script optionally takes a config file as an argument with -c config_file_name_goes_here
, otherwise using a default file of config.ini
which contains the connection details for your speech to text instance.
Depending on the specified operation, the script also accepts a name, description, and file for an associated resource. For instance, new custom models should have a name and description, and a corpus should have a name and associated file.
List all base models:
python models.py -o list -t base_model
List all custom models:
python models.py -o list -t custom_model
Create a custom model:
python models.py -o add -t custom_model -n "model1" -d "my first model"
Add a corpus file for a custom model (the custom model's customization_id is stored in config.ini.model1
)(corpus1.txt
contains the corpus contents):
python models.py -c config.ini.model1 -o add -n "corpus1" -f "corpus1.txt"
List all corpora for a custom model (the custom model's customization_id is stored in config.ini.model1
):
python models.py -c config.ini.model1 -o list -t corpus
Train a custom model (the custom model's customization_id is stored in config.ini.model1
):
python models.py -c config.ini.model1 -o update -t custom_model
Note some parameter combinations are not possible. The operations supported all wrap the SDK methods documented at https://cloud.ibm.com/apidocs/speech-to-text.