G-Wang / Tacotron-pytorch-1

A Pytorch Implementation of Tacotron: End-to-end Text-to-speech Deep-Learning Model

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

A Pytorch Implementation of Tacotron: End-to-end Text-to-speech Deep-Learning Model

Implement google's Tacotron TTS system with pytorch.
tacotron

Updates

2018/09/15: Fix RNN feeding bug.
2018/11/04: Add attention mask and loss mask.

Requirements

Download python and pytorch.

  • python==3.6.5
  • pytorch==0.4.1

You can use requirements.txt to download packages below.

# I recommend you use virtualenv.
$ pip install -r requirements.txt
  • librosa
  • numpy
  • pandas
  • scipy
  • matplotlib

Usage

  • Data
    Download LJSpeech provided by keithito. It contains 13100 short audio clips of a single speaker. The total length is approximately 20 hrs.

  • Set config.

# Set the 'meta_path' and 'wav_dir' in `hyperparams.py` to paths of your downloaded LJSpeech's meta file and wav directory.
meta_path = 'Data/LJSpeech-1.1/metadata.csv'
wav_dir = 'Data/LJSpeech-1.1/wavs'
  • Train
# If you have pretrained model, add --ckpt <ckpt_path>
$ python main.py --train --cuda
  • Evaluate
# You can change the evaluation texts in `hyperparams.py`
# ckpt files are saved in 'tmp/ckpt/' in default
$ python main.py --eval --cuda --ckpt <ckpt_timestep.pth.tar>

Samples

The sample texts is based on Harvard Sentences. See the samples at samples/ which are generated after training 200k.

Alignment

The model starts learning something at 30k. alignment

Differences from the original Tacotron

  1. Data bucketing (Original Tacotron used loss mask)
  2. Remove residual connection in decoder_CBHG
  3. Batch size is set to 8
  4. Gradient clipping
  5. Noam style learning rate decay (The mechanism that Attention is all you need applies.)

Refenrence

  1. (Tensorflow) Kyubyong's implementation
  2. (Tensorflow) acetylSv's implementation
  3. (Pytorch) soobinseo's implementaition

Finally, I have to say this work is highly based on Kyubyong's work, so if you are a tensorflow user, you may want to see his work. Also, feel free to give some feedbacks!

About

A Pytorch Implementation of Tacotron: End-to-end Text-to-speech Deep-Learning Model

License:MIT License


Languages

Language:Python 100.0%