This repository contains the one million of 5 second segments of augmented voice and noise combinations with labels.
- Duration: 2M of files, 5 seconds length and 50k of test files.
- Audio format: WAV files with 16kHz sampling rate and 16 bit depth
- Signal-to-noise ratio (SNR): from 3 to 30db
- Source voice speedup or slowdown: from 0.8 to 1.5
- Synthetic and Real Room Impulse Response (RIR) reverberation
- Encoding codecs are included in half of the samples: low/high quality mp3, G2.111
- Training Dataset: https://shared.korshakov.com/datasets/supervad-2/vad_train.tar.gz
- Testing Dataset: https://shared.korshakov.com/datasets/supervad-2/vad_test.tar.gz
I am also publishing source files that are used for mixing, they are all wav files withg 16kHz sampling rate:
- Speech Training: https://shared.korshakov.com/datasets/supervad-2/speech_train.tar.gz
- Speech Testing: https://shared.korshakov.com/datasets/supervad-2/speech_test.tar.gz
- Non-Speech files: https://shared.korshakov.com/datasets/supervad-2/non_speech.tar.gz
- Synthetic RIR (cut to the begining and normalized): https://shared.korshakov.com/datasets/supervad-2/rir_synthetic.tar.gz
- Real RIR (cut to the begining and normalized): https://shared.korshakov.com/datasets/supervad-2/rir_real.tar.gz
- v2 - filtered some files with too loud background voices, removed some songs from the dataset that also had voice
- v1 - initial release
- Musan (CC BY 4.0) - Clean Voice and Noises
- SLR26 (CC BY 4.0) - Synthetic RIR
- SLR28 (Apache 2.0) - Real RIR
- VOiCES (CC BY 4.0) - Clean Voice, Noises and RIR
- DNS-4 (Public Domain/CC BY 4.0/Attr) - Clean Voice and Noises
- Realistic urban sound mixture dataset (CC BY 4.0) - Noises
- Common Voice 16.0 (Mozilla Public License 2.0) - Unused for now
Caution
Downloading and synthesizing the dataset requires about 8TB of disk space and several hours to download, unpack and synthesize.
To download source datasets, you can invoke download.sh
script. For this script aria2
is required.
./download.sh
Script have very limited amount of dependencies that you probabbly already have installed.
pip install tqdm glob torch torchaudio soundfile
Before synthesizing the dataset, you need to prepare source datasets. To do so, you can invoke prepare.py
script.
python3 prepare.py
To synthesize the dataset, you can invoke synthesize.py
script.
python3 synthesize.py
To package the dataset you need tar
and pigz
to be installed.
./pack.sh
CC BY 4.0