andi611 / Conditional-SpecGAN-Tensorflow

Text-to-Speech Synthesis by Generating Spectrograms using Generative Adversarial Network

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Conditional SpecGAN

A (conditional) audio synthesis generative adversarial network that generates spectrogram, which furthur synthesize raw waveform, implementation in Tensorflow.

Requirements:

  • Tensorflow r1.10.1
  • Python 3.6
  • numpy 1.14.5
  • librosa 0.6.2
  • tqdm 4.26.0
  • matplotlib 2.2.3

Introduction

Text-to-Speech Synthesis by Generating Spectrograms using Generative Adversarial Network. This work is based on the original implementation of SpecGAN, where I furthur explore on conditioning SpecGAN training. Additionally, an energy based data preprocessing scheme is applied, which results in an improvement in audio quality.

The preprocess result can be demonstrated by the following visualization:

Build Dataset

  • Download training data: here

  • Run './src/utils/preprocess_data.py' to process data or download the processed data: here

  • Run './src/utils/visualize_wav.py' to visualize the processed clean data or download the results: here

  • Run './src/utils/make_tfrecord.py' to process .wav files into .tfrecord training ready files, or download the processed data: here

  • Extract the .tgz file in step.4, and place them to the relevent path according to args.data_dir in ./src/config.py:

data_dir='../data/sc09_preprocess_energy'

This default path can be modified by changing the '--data_dir option in './src/config.py'.

Usage

  • Resume or train a new SpecGAN model by the following command:
python3 ./src/runner.py train
  • To inference and generate from a trained SpecGAN model, use the following command:
python3 ./src/runner.py generate
  • To train or generate from a conditional SpecGAN, use the following command (Note: This feature is still under implementation and is not complete!):
python3 ./src/runner.py train --conditional
python3 ./src/runner.py generate --conditional

About

Text-to-Speech Synthesis by Generating Spectrograms using Generative Adversarial Network

License:MIT License


Languages

Language:Python 100.0%