Dataset
dreamibor opened this issue · comments
Hi, is there a way to create the training dataset? I mean the approach that you take to get seperate speech and noise data?
Hi, I'd like to appreciate your question.
-
Way to create training data
Training data is generated by choosing from ./dataset/train/noise/ and ./dataset/train/speech/* respectively. The 2 audio is simulated by chosen SNR and revereberent time randomly. In script "train.py", the simulated speech is generated without writing file in HDD(The more training data file, HDD disc capacity is insufficient). -
Separete speech and noise data
As you know, this approach needs parallel corpus(noise and speech). Research often uses CHiME corpus.
Regards,
Thank you for your response! I think your answer solved my problem and I will close the issue.