beijinggao / FastSpeech

Implementation of "FastSpeech: Fast, Robust and Controllable Text to Speech"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

FastSpeech

Implementation of "FastSpeech: Fast, Robust and Controllable Text to Speech"

Training

  1. Set data_path in hparams.py as the LJSpeech folder
  2. Set teacher_dir in hparams.py as the data directory where the alignments and melspectrogram targets are saved
  3. Put checkpoint of the pre-trained transformer-tts (weights of the embedding/encoder layers are used)
  4. python train.py

Training curves (orange: character / blue: phoneme)

The size of the train dataset is different because transformer-tts trained with phoneme shows more diagonal attention
train:val:test=8:1:1, total => character:1126 / phoneme:3412

Training plots (orange: batch_size:64 / blue: batch_size:32)

Audio Samples

You can hear the audio samples here

About

Implementation of "FastSpeech: Fast, Robust and Controllable Text to Speech"

License:MIT License


Languages

Language:Jupyter Notebook 99.8%Language:Python 0.1%Language:HTML 0.0%