Detail TTS

The model newly proposed three significant important methods to become the best practice of AR TTS.

Although RVQ is used, the actual training employs continuous features, I call it fake discretization.
All in one model. The model contains gpt, diffusion, vqvae, gan and flowvae all in one. One train one inference.
Both prefixed spk emb and prompt are used to get benefit from both Valle type inference and Tortoise type training.

Demo

check api.py

accelerate launch vqvae/train_tts.py

For fine tuning, change the pretrain model load path.

VQ and VITS from gsv

Diffusion and GPT from tortoise

NAR version please check ttts.

SVC version please check detail-vc.

All generative model in one for better TTS model

Language:Python 56.4%Language:Jupyter Notebook 43.6%