Dataset

Question

dreamibor opened this issue 5 years ago · comments

Hi, is there a way to create the training dataset? I mean the approach that you take to get seperate speech and noise data?

AkojimaSLP · Answer 1 · Thu Nov 14 2019 06:22:20 GMT+0800 (China Standard Time)

Hi, I'd like to appreciate your question.

Way to create training data
Training data is generated by choosing from ./dataset/train/noise/ and ./dataset/train/speech/* respectively. The 2 audio is simulated by chosen SNR and revereberent time randomly. In script "train.py", the simulated speech is generated without writing file in HDD(The more training data file, HDD disc capacity is insufficient).
Separete speech and noise data
As you know, this approach needs parallel corpus(noise and speech). Research often uses CHiME corpus.

Regards,

Zhaoyu Zhang · Answer 2 · Tue Dec 10 2019 23:16:47 GMT+0800 (China Standard Time)

Thank you for your response! I think your answer solved my problem and I will close the issue.