BridgetteSong / ExpressiveTacotron

This repository provides a multi-mode and multi-speaker expressive speech synthesis framework, including multi-attentive Tacotron, DurIAN, Non-attentive Tacotron, GST, VAE, GMVAE, and X-vectors for building prosody encoder.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Expressive Tacotron (implementation with Pytorch)

Introduction

This repository provides a multi-mode and multi-speaker expressive speech synthesis framework, including multi-attentive Tacotron, DurIAN, Non-attentive Tacotron.

The framework also includes various deep learning architectures such as Global Style Token (GST), Variational Autoencoder (VAE), and Gaussian Mixture Variational Autoencoder (GMVAE), and X-vectors for building prosody encoder.

  • Only provides kernel model files, not including data prepared scripts, training scripts and synthesis scripts
  • You can reference ExpressiveTacotron for more training scripts

Available recipes

Expressive Mode

Framework Mode

Differences

  • Non-attentive Tacotron: duration stacked convolution layers are concatenated with encoder outputs

Acknowledgements

This implementation uses code from the following repos: NVIDIA, ESPNet, ERISHA, ForwardAttention

About

This repository provides a multi-mode and multi-speaker expressive speech synthesis framework, including multi-attentive Tacotron, DurIAN, Non-attentive Tacotron, GST, VAE, GMVAE, and X-vectors for building prosody encoder.


Languages

Language:Python 100.0%