beijinggao / FastSpeech

Implementation of "FastSpeech: Fast, Robust and Controllable Text to Speech"

FastSpeech

Implementation of "FastSpeech: Fast, Robust and Controllable Text to Speech"

Training

Set data_path in hparams.py as the LJSpeech folder
Set teacher_dir in hparams.py as the data directory where the alignments and melspectrogram targets are saved
Put checkpoint of the pre-trained transformer-tts (weights of the embedding/encoder layers are used)
python train.py

Training curves (orange: character / blue: phoneme)

The size of the train dataset is different because transformer-tts trained with phoneme shows more diagonal attention

train:val:test=8:1:1, total => character:1126 / phoneme:3412

Training plots (orange: batch_size:64 / blue: batch_size:32)

Audio Samples

You can hear the audio samples here

About

Implementation of "FastSpeech: Fast, Robust and Controllable Text to Speech"

MIT License

Languages

Language:Jupyter Notebook 99.8%Language:Python 0.1%Language:HTML 0.0%