Attention Alignment Not Working

Question

Attention Alignment Not Working

jinny1208 opened this issue 2 years ago · comments

I am currently training the provided model with Korean and English datasets, with a total of 27 speakers.
As stated in the README.md, I added Korean to "symbols" as follows:

The problem is that even after training the model for over 45,000 steps, the attention alignment is not forming.

The target and predicted mel-spectrograms seem similar enough.

To anyone who has used this repo and to @Jeevesh8 , how long does it normally take for the attention to start aligning properly? Should I continue training?

Any help and advice would be greatly appreciated.

Jeevesh Juneja · Answer 1 · Mon Mar 21 2022 02:25:42 GMT+0800 (China Standard Time)

@jinny1208 This is common observed phenomenon[See here, for example]. The reason is mainly, I think because mel-spectrogram is predicted frame by frame(auto-regressively), so even if the model learns to just copy the last frame of the sequence(without learning anything about alignment), it can easily lower the loss quite a bit. You need to train longer.

I would suggest you to initialise from Tacotron-2 English pre-trained weights, for faster alignment.

Yejin · Answer 2 · Mon Mar 21 2022 10:14:56 GMT+0800 (China Standard Time)

Hi, thanks for your answer! I will train longer and update the results here

Yejin · Answer 3 · Wed Mar 30 2022 11:42:50 GMT+0800 (China Standard Time)

The alignment works to some degree. I probably need to train longer and do somthing else to get clearer and more robust alignments.