ADKosm / talking-machines

Pytorch implementation of Tacotron2, modern text-to-speech model

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Tacotron2

Pytorch implementation of Tacotron2, a modern text-to-speech model based on this paper

Usage

To convert mel spectrograms to audio we need Nvidia's pretrained Vocoder

! git clone https://github.com/NVIDIA/waveglow.git

! pip install googledrivedownloader

from google_drive_downloader import GoogleDriveDownloader as gdd

gdd.download_file_from_google_drive(
    file_id='1rpK8CzAAirq9sWZhe9nlfvxMF1dRgFbF',
    dest_path='./waveglow_256channels_universal_v5.pt'
)

Then run ./run_docker.sh with correct volume option

Training

Download LJSpeech dataset

Set preferred settings in config.py, then run python train.py

In wandb.ai will be logged:

  • Train and validation losses
  • Original text
  • Predicted and ground truth mel spectrograms
  • Predicted and ground truth audio
  • Probabilties of the last frame over the audio

Inference

python inference.py "Your text for speech synthesis"

The result will be logged in wandb.ai.

You can use my pretrained model:

gdd.download_file_from_google_drive(
    file_id='1gjOSUTyuFsdVOpPcLaEZjGHpgBEs_lTZ',
    dest_path='./tacotron.ptt'
)

About

Pytorch implementation of Tacotron2, modern text-to-speech model


Languages

Language:Python 98.1%Language:Dockerfile 1.4%Language:Shell 0.5%