ex3ndr / supervad-dataset

SuperVAD dataset for voice activity detection training

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

🚀 SuperVAD dataset

This repository contains the one million of 5 second segments of augmented voice and noise combinations with labels.

Dataset

  • Duration: 2M of files, 5 seconds length and 50k of test files.
  • Audio format: WAV files with 16kHz sampling rate and 16 bit depth
  • Signal-to-noise ratio (SNR): from 3 to 30db
  • Source voice speedup or slowdown: from 0.8 to 1.5
  • Synthetic and Real Room Impulse Response (RIR) reverberation
  • Encoding codecs are included in half of the samples: low/high quality mp3, G2.111

Downloads

Extra downloads

I am also publishing source files that are used for mixing, they are all wav files withg 16kHz sampling rate:

Versions

  • v2 - filtered some files with too loud background voices, removed some songs from the dataset that also had voice
  • v1 - initial release

References

Reproduction

Caution

Downloading and synthesizing the dataset requires about 8TB of disk space and several hours to download, unpack and synthesize.

Downloading sources

To download source datasets, you can invoke download.sh script. For this script aria2 is required.

./download.sh

Installing dependencies

Script have very limited amount of dependencies that you probabbly already have installed.

pip install tqdm glob torch torchaudio soundfile

Preparing source datasets

Before synthesizing the dataset, you need to prepare source datasets. To do so, you can invoke prepare.py script.

python3 prepare.py

Synthesizing the dataset

To synthesize the dataset, you can invoke synthesize.py script.

python3 synthesize.py

Packaging the dataset

To package the dataset you need tar and pigz to be installed.

./pack.sh

License

CC BY 4.0

About

SuperVAD dataset for voice activity detection training


Languages

Language:Jupyter Notebook 98.2%Language:Python 1.2%Language:Shell 0.6%