The model newly proposed three significant important methods to become the best practice of AR TTS.
- Although RVQ is used, the actual training employs continuous features, I call it fake discretization.
- All in one model. The model contains gpt, diffusion, vqvae, gan and flowvae all in one. One train one inference.
- Both prefixed spk emb and prompt are used to get benefit from both Valle type inference and Tortoise type training.
check api.py
accelerate launch vqvae/train_tts.py
For fine tuning, change the pretrain model load path.
VQ and VITS from gsv
Diffusion and GPT from tortoise
NAR version please check ttts.
SVC version please check detail-vc.