wangkenpu / WSJ2WAV

Convert WSJ sphere format to waveform and do data simulation.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

WSJ Data Preparation

This repository aims at providing some useful scritps to do data preparation for WSJ data.

Install Necessary Tools

cd tools
make

How to Use

WSJ0

# convert sphere to waveform
bash wsj0/1_sph2wav.sh   # remember to change wsj0_dir and save_dir

# add noise
python wsj0/2_prep_noisy_data.py -h

Public Dataset

There are some public datasets we can use, including noise, RIR and well-simulated noisy speech.

Noise Datasets

You can use any noise corpus. But the sample rate of noise and clean speech must be same. Ohterwise, you need to use tools/resample.py to down-sample clean speech or noise. There are some open source noise we can use:

  1. Nonspeech100
  2. MUSAN
  3. freesound
  4. DEMAND

Room Impulse Response (RIR)

  1. OpenSLR
  2. AcouSP

Noisy Speech Datasets

  1. SUPERSEDED

About

Convert WSJ sphere format to waveform and do data simulation.

License:MIT License


Languages

Language:Python 87.1%Language:Shell 8.3%Language:Makefile 4.6%