Expressive Tacotron (implementation with Pytorch)
Introduction
This repository provides a multi-mode and multi-speaker expressive speech synthesis framework, including multi-attentive Tacotron, DurIAN, Non-attentive Tacotron.
The framework also includes various deep learning architectures such as Global Style Token (GST), Variational Autoencoder (VAE), and Gaussian Mixture Variational Autoencoder (GMVAE), and X-vectors for building prosody encoder.
- Only provides kernel model files, not including data prepared scripts, training scripts and synthesis scripts
- You can reference ExpressiveTacotron for more training scripts
Available recipes
Expressive Mode
Framework Mode
- Tacotron2
- ForwardAttention
- DurIAN
- Non-attentive Tacotron
- GMMv2 Attention
- Dynamic Convolution Attention (Todo)
Differences
- Non-attentive Tacotron: duration stacked convolution layers are concatenated with encoder outputs
Acknowledgements
This implementation uses code from the following repos: NVIDIA, ESPNet, ERISHA, ForwardAttention