ibab / tensorflow-wavenet

A TensorFlow implementation of DeepMind's WaveNet paper

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The generated wav file is noise

kaidi-jin opened this issue · comments

I trained this model with the default params and wavenet_params.json config file. Both the trained model generated wav file is white noise. The loss is around 1.563 after step 99999.

I meet the problems like XXX.wav was ignored as it contains only silence. I deal with it by --silence_threshold=0.1 and silence_threshold=0.
The last small questions is that I download the VCTK-corpus from https://datashare.is.ed.ac.uk/handle/10283/2651
the wav48 have 109 video files while the txt have 108 files. The txt file lost P315, so I deleted the P315
files in video in wav48 folder.

By the way, The time cost is about 11 hours with my GPU is TITAN V. Is that any problems with my train config? The train time is very less?

Can anyone help me with these issues? I have been struggled for several weeks.

Thank you very much!

WaveNet can‘t generate intelligible speech using only previous samples and global condition(speaker id), U need to feed the network with local condition such as MFCC or mel-spectrogram and so on.

@ryuclc , Thank you for your advise. It seems like a good way. But I have not see it before and have no idea to carry out it. Does It need to change the code?
At now I used the option --wave_seed to give the sample seed. It's have better performance than not.
overall , I will search some related information about this MFFC method and try to use it.

@stuking , Yes, it needs some code that plus local condition when dilation conv, you can find how to do from the paper of WaveNet. Or search 'wavenet vocoder' on Github, others have coded.
MFCC will get a better performance, from the result of my running the code

@ryuclc thanks for your help!!!