syang1993 / gst-tacotron

A tensorflow implementation of the "Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

can we synthesis speaker-A's tone with speaker-B's prosody?

niu0717 opened this issue · comments

when i read gst paper, i found it contains not only the token but also the tone of the speaker. In other word, can we separate prosody from the ref audio as much as possible?