Hi-Fi GAN Re-implementation

This repo contains a reimplementation of neural vocoder from Hi-Fi GAN paper. The moodel has few slight differences from original one. First, we use much larger multi-period discriminator with additional sub-final layer. Second, we average residual blocks' outputs in generator instead of adding, which lead to better stability during training.

Preprocessing steps heavily rely on official implentation and Mel-GAN.

Setup

First, clone the repo

git clone https://github.com/Mikezz1/hifi-gan
pip3 install -r requirements

Then install all dependencies

cd hifi_gan

pip3 install -r requirements

And download model checkpoint (if file is unavailable use gdrive link)

sh load_checkpoint.sh

Training

To start training, run the following script. It takes one epoch to achieve distinguishable words, 4-5 epochs to get rid of robtic voice and at least 20 epochs to achive mostly clean sound.

python3 train.py --config="configs/base_config.yaml"

Inference

To run model on test samples, you need to calculate melspecs for reference audios first:

python3 prepare_test.py

Make sure that you have reference audios audio_1.wav, audio_2.wav and audio_3.wav in data folder (or specify other path / filenames inside the script). Then, run the inference script:

python3 inference.py --config='path/to/config' --mel_filenames='test_spec'

test_spec option specifies filename pattern of source melspecs

config option is a path to config. Make sure that you specified path to checkpoint in the config.

Mikezz1 / hifi-gan

Hi-Fi GAN Re-implementation

Setup

Training

Inference

About

Languages