"Blank" attention plots

Question

"Blank" attention plots

gabriel-souza-omni opened this issue 5 years ago · comments

gabriel-souza-omni commented 5 years ago

Hello. I am trying to train a portuguese speaking model, and while the audios generated by the wavenet during its training sound pretty good, when trying to generalize the audio sounds garbled. The only apparent issue is the attention plots shown below. Do you know what could cause this?

mindmapper15 · Answer 1 · Wed Nov 20 2019 09:56:19 GMT+0800 (China Standard Time)

I think you have the same problem with this.
#77

Also, there is another attention class you can use called Attention() in models/tacotron.py
I think it is Bahdanau Attention. You might have to change LSA to this one in Decoder class.

I'm currently training with Bahdanau Attention with korean voice dataset (total 18 hours) and this is my attention result from 0 to 55k steps.
(NOTE : I didn't follow the default training schedule in hparams.py. I followed the Tacotron paper's training schedule)

Christian Schäfer · Answer 2 · Fri Dec 20 2019 17:13:14 GMT+0800 (China Standard Time)

Hi, I also had trouble to build attention for some custom datasets. What helped best so far was:

In the LSA module change the sigmoid activation to softmax (scores = F.softmax(u, dim=1)). After attention is build this could be reverted without losing attention.
Start the schedule with a larger reduction factor (e.g. 12 instead of 7)

gabriel-souza-omni · Answer 3 · Wed Jan 08 2020 19:43:21 GMT+0800 (China Standard Time)

Thanks guys. These suggestions worked really well, and I managed to get the attention working.

mindmapper15 · Answer 4 · Fri Jan 10 2020 12:29:33 GMT+0800 (China Standard Time)

One more thing.
It will be good to trimming your audio dataset which removes the front and rear silence in audio.
This will help attention to find proper alignment between text and audio more easily.