syang1993 / gst-tacotron

A tensorflow implementation of the "Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

poor alignment when synthesizing long sentences

moonnee opened this issue · comments

Thank you for your work! It helps a lot.
I want to ask whether your alignment is good when synthesizing sentences more than 10 words, like about 20 words. The paper said 'the model fails when conditioned on the shorter source phrases, successfully aligns when conditioned on the longest input.' The reference audio I used are about 20 words, but only when synthesizing shorter sentences, it works well. Attached please find some samples. Btw, I use nancy and blizzard 2017 for training.
Could you give me some suggestions? Thank you.
samples.zip

Hi, for long sentences, you can try the GMM attention. It works well especially for long sentences.