Made in Vancouver, Canada by Picovoice
This repo is a minimalist and extensible framework for benchmarking different noise suppression engines on 16kHz monaural speech data across varying signal-to-noise ratios.
The only currently implemented data set is the synthetic, no reverb
part of the test set of the first installment of
the Microsoft deep noise suppression challenge at Interspeech 2020. It consists of 150 noisy files across various SNR
levels as well as their clean reference files. The data is available
here
within Microsoft's DNS-Challenge repository.
Either clone the whole repo and switch to the interspeech2020/master
branch, or run the following commands in a new
directory to sparsely checkout the required files:
git init
git remote add -f origin git@github.com:microsoft/DNS-Challenge.git
git sparse-checkout init
git switch interspeech2020/master
git sparse-checkout set datasets/test_set/synthetic/no_reverb LICENSE README.md
The --data-folder
argument required for this benchmark needs to be set to
/PATH/TO/DNS-Challenge/datasets/test_set/synthetic/no_reverb
The STOI metric is a simple intrusive metric that quantifies the similarity between the denoising output and the clean reference. The metric is a value between 0 and 1, where 1 means that the denoising result is exactly equal to the reference.
Real-time factor (RTF) is the ratio of CPU (processing) time to the length of the input speech file. A noise suppression engine with lower RTF is more computationally efficient.
This benchmark has been developed and tested on Ubuntu 20.04
using Python 3.8
.
- Install FFmpeg
- Download the data
- Install the requirements:
pip3 install -r requirements.txt
In the following commands, replace ${DATASET}
with one of the supported datasets and ${DATA_FOLDER}
with the path
to the dataset folder. See Data for details.
In order to remix the dataset at a specific signal-to-noise ratio (SNR), add --remix-snr-db ${SNR_DB}
to the
arguments to python3 benchmark.py
and replace ${SNR_DB}
with your SNR in dB.
The mixer separates the clean data from the noise, scales the noise, and mixes them together again. The scaling is done in such a way that
To compute the
Clone the RNNoise repository from https://gitlab.xiph.org/xiph/rnnoise
and follow the build instructions. Then run the following command, replacing ${RNNOISE_FOLDER}
with the path to the
root folder of the RNNoise repository.
python3 benchmark.py \
--engine mozilla_rnnoise \
--dataset ${DATASET} \
--data-folder ${DATA_FOLDER} \
--rnnoise-demo-path ${RNNOISE_FOLDER}/examples/rnnoise_demo
Note that since RNNoise operates at 48kHz, this benchmark internally resamples the audio.
Replace ${DATASET}
with one of the supported datasets, ${DATASET_FOLDER}
with path to dataset, and
${PICOVOICE_ACCESS_KEY}
with AccessKey obtained from Picovoice Console.
python3 benchmark.py \
--engine picovoice_koala \
--dataset ${DATASET} \
--data-folder ${DATA_FOLDER} \
--picovoice-access-key ${PICOVOICE_ACCESS_KEY}
Engine | Interspeech2020 |
---|---|
Original (no enhancement) | 0.915 |
Mozilla RNNoise | 0.925 |
Picovoice Koala | 0.959 |
Measurement is carried on an Ubuntu 20.04 machine with Intel CPU (Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz
), 64 GB of
RAM, and NVMe storage.
Engine | RTF |
---|---|
Mozilla RNNoise | 0.02 |
Picovoice Koala | 0.03 |