WSJ Data Preparation

This repository aims at providing some useful scritps to do data preparation for WSJ data.

Install Necessary Tools

cd tools
make

How to Use

WSJ0

# convert sphere to waveform
bash wsj0/1_sph2wav.sh   # remember to change wsj0_dir and save_dir

# add noise
python wsj0/2_prep_noisy_data.py -h

Public Dataset

There are some public datasets we can use, including noise, RIR and well-simulated noisy speech.

Noise Datasets

You can use any noise corpus. But the sample rate of noise and clean speech must be same. Ohterwise, you need to use tools/resample.py to down-sample clean speech or noise. There are some open source noise we can use:

Room Impulse Response (RIR)

Noisy Speech Datasets

SUPERSEDED

About

Convert WSJ sphere format to waveform and do data simulation.

MIT License

Languages

Language:Python 87.1%Language:Shell 8.3%Language:Makefile 4.6%