A tensorflow implementation of the "Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis"
Geek Repo:Geek Repo
Github PK Tool:Github PK Tool
niu0717 opened this issue 5 years ago · comments
when i read gst paper, i found it contains not only the token but also the tone of the speaker. In other word, can we separate prosody from the ref audio as much as possible?