This is a TTS model based on VITS that can control the output speech emotion through natural language and control the speaker through reference audio.
Geek Repo:Geek Repo
Github PK Tool:Github PK Tool