A (Heavily Documented) TensorFlow Implementation of Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model
As of May 17, 2017, this is still a first draft. You can run it following the steps below, but probably you should get poor results. I'll be working on debugging this weekend. (Code reviews and/or contributions are more than welcome!)
- NumPy >= 1.11.1
- TensorFlow >= 1.0
- librosa
Since the original paper was based on their internal data, I use a freely available one, instead.
The World English Bible is a public domain update of the American Standard Version of 1901 into modern English. Its text and audio recordings are freely avaiable here. Unfortunately, however, each of the audio files matches a chapter, not a verse, so is too long in most cases. I sliced them by verse manually. You can get them on my dropbox
- STEP 1. Adjust hyper parameters in
hyperparams.py
if necessary. - STEP 2. Download the data and extract it.
- STEP 3. Run
train.py
. - STEP 4. Run
eval.py
to get samples.