fatchord / WaveRNN

WaveRNN Vocoder + TTS

Home Page:https://fatchord.github.io/model_outputs/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

"Blank" attention plots

gabriel-souza-omni opened this issue · comments

Hello. I am trying to train a portuguese speaking model, and while the audios generated by the wavenet during its training sound pretty good, when trying to generalize the audio sounds garbled. The only apparent issue is the attention plots shown below. Do you know what could cause this?

3gufu6

I think you have the same problem with this.
#77

Also, there is another attention class you can use called Attention() in models/tacotron.py
I think it is Bahdanau Attention. You might have to change LSA to this one in Decoder class.

I'm currently training with Bahdanau Attention with korean voice dataset (total 18 hours) and this is my attention result from 0 to 55k steps.
(NOTE : I didn't follow the default training schedule in hparams.py. I followed the Tacotron paper's training schedule)

ezgif-4-bcf665411da9

Hi, I also had trouble to build attention for some custom datasets. What helped best so far was:

  1. In the LSA module change the sigmoid activation to softmax (scores = F.softmax(u, dim=1)). After attention is build this could be reverted without losing attention.
  2. Start the schedule with a larger reduction factor (e.g. 12 instead of 7)

Thanks guys. These suggestions worked really well, and I managed to get the attention working.

One more thing.
It will be good to trimming your audio dataset which removes the front and rear silence in audio.
This will help attention to find proper alignment between text and audio more easily.