- Spoken Content and Voice Factorization for Few-shot Speaker Adaptation
- Attentron: Few-shot Text-to-Speech Exploiting Attention-based Variable Length Embedding
- Zero-Shot Multi-Speaker Text-To-Speech with State-of-the-art Neural Speaker Embeddings
- END-TO-END TEXT-TO-SPEECH USING LATENT DURATION BASED ON VQ-VAE
- LOW-RESOURCE EXPRESSIVE TEXT-TO-SPEECH USING DATA AUGMENTATION