xuerq / tacotron

A TensorFlow Implementation of Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

A (Heavily Documented) TensorFlow Implementation of Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model

Warning

As of May 17, 2017, this is still a first draft. You can run it following the steps below, but probably you should get poor results. I'll be working on debugging this weekend. (Code reviews and/or contributions are more than welcome!)

Requirements

  • NumPy >= 1.11.1
  • TensorFlow >= 1.0
  • librosa

Data

Since the original paper was based on their internal data, I use a freely available one, instead.

The World English Bible is a public domain update of the American Standard Version of 1901 into modern English. Its text and audio recordings are freely avaiable here. Unfortunately, however, each of the audio files matches a chapter, not a verse, so is too long in most cases. I sliced them by verse manually. You can get them on my dropbox

Work Flow

  • STEP 1. Adjust hyper parameters in hyperparams.py if necessary.
  • STEP 2. Download the data and extract it.
  • STEP 3. Run train.py.
  • STEP 4. Run eval.py to get samples.

About

A TensorFlow Implementation of Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model

License:Apache License 2.0


Languages

Language:Python 100.0%