atomicoo / tacotron2-mandarin

Tensorflow implementation of Chinese/Mandarin TTS (Text-to-Speech) based on Tacotron-2 model.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

我的语音合成最新进展见 ParallelTTS

Go ParallelTTS for my latest work of TTS.


Tacotron-2 的 PyTorch 实现,见 Tacotron2-PyTorch

PyTorch implementation of Tacotron-2, See Tacotron2-PyTorch.


tacotron-2-mandarin

Tensorflow implementation of DeepMind's Tacotron-2. A deep neural network architecture described in this paper: Natural TTS synthesis by conditioning Wavenet on MEL spectogram predictions

Repo Structure

tacotron-2-mandarin-griffin-lim
|--- datasets
|--- logs-Tacotron
     |--- eval-dir
     |--- plots
     |--- taco_pretrained
     |--- wavs
|--- papers
|--- prepare
|--- tacotron
     |--- models
     |--- utils
|--- tacotron_output
     |--- eval
     |--- logs-eval
          |--- plots
          |--- wavs
|--- training_data
     |--- audio
     |--- linear
     |--- mels

Samples

There are some synthesis samples here.

Pretrained

you can get pretrained model here.

Quick Start

OS: Ubuntu 16.04

Step (0) - Git clone repository

git clone https://github.com/atomicoo/tacotron2-mandarin.git
cd tacotron-2-mandarin-griffin-lim/

Step (1) - Install dependencies

  1. Install Python 3 (python-3.5.5 for me)

  2. Install TensorFlow (tensorflow-1.10.0 for me)

  3. Install other dependencies

    pip install -r requirements.txt
    

Step (2) - Prepare dataset

  1. Download dataset BIAOBEI or THCHS-30

    After that, your doc tree should be:

    tacotron-2-mandarin-griffin-lim
    |--- ...
    |--- BZNSYP
         |--- ProsodyLabeling
              |--- 000001-010000.txt
         |--- Wave
    |--- ...
    
  2. Prepare dataset (default is BIAOBEI)

    python prepare_dataset.py
    

    If preparing THCHS-30, you can use parameter --dataset=THCHS-30.

    After that, you can get a folder BIAOBEI as follow:

    tacotron-2-mandarin-griffin-lim
    |--- ...
    |--- BIAOBEI
         |--- biaobei_48000
    |--- ...
    
  3. Preprocess dataset (default is BIAOBEI)

    python preprocess.py
    

    If prrprocessing THCHS-30, you can use parameter --dataset=THCHS-30.

    After that, you can get a folder training_data as follow:

    tacotron-2-mandarin-griffin-lim
    |--- ...
    |--- training_data
         |--- audio
         |--- linear
         |--- mels
         |--- train.txt
    |--- ...
    

Step (3) - Train tacotron model

python train.py

More parameters, please see train.py.

After that, you can get a folder logs-Tacotron as follow:

tacotron-2-mandarin-griffin-lim
|--- ...
|--- logs-Tacotron
     |--- eval-dir
     |--- plots
     |--- taco_pretrained
     |--- wavs
|--- ...

Step (4) - Synthesize audio

python synthesize.py

More parameters, please see synthesize.py.

After that, you can get a folder tacotron_output as follow:

tacotron-2-mandarin-griffin-lim
|--- ...
|--- tacotron_output
     |--- eval
     |--- logs-eval
          |--- plots
          |--- wavs
|--- ...

References & Resources

Rayhane-mamah/Tacotron-2

About

Tensorflow implementation of Chinese/Mandarin TTS (Text-to-Speech) based on Tacotron-2 model.

License:MIT License


Languages

Language:Python 99.1%Language:Jupyter Notebook 0.9%