Modified HiFi-GAN for Assem-VC

Pretrained Model

To GTA finetune HiFi-GAN models, you should download Pretrained models and transfer from those weight.

You can use pretrained UNIVERSAL_V1 models authors of HiFi-GAN provide.
Download pretrained models
Details of each folder are as in follows:

Folder Name	Generator	Dataset	Fine-Tuned
LJ_V1	V1	LJSpeech	No
LJ_V2	V2	LJSpeech	No
LJ_V3	V3	LJSpeech	No
LJ_FT_T2_V1	V1	LJSpeech	Yes (Tacotron2)
LJ_FT_T2_V2	V2	LJSpeech	Yes (Tacotron2)
LJ_FT_T2_V3	V3	LJSpeech	Yes (Tacotron2)
VCTK_V1	V1	VCTK	No
VCTK_V2	V2	VCTK	No
VCTK_V3	V3	VCTK	No
UNIVERSAL_V1	V1	Universal	No

make cp_hifigan directory.
```
mkdir cp_hifigan
```
Download g_02500000 and do_02500000 from following link
place them in cp_hifigan/ directory.

Fine-Tuning

Generate GTA mel-spectrograms in torch.Tensor format using Assem-VC.
The file name of the generated mel-spectrogram should match the audio file and the extension should be .gta.
Example:
```
Audio File : p233_392.wav
Mel-Spectrogram File : p233_392.wav.gta
```

Run the following command.

python train.py --config config_v1.json \
                --input_wavs_dir <root_path_of_input_audios> \
                --input_mels_dir <root_path_of_GTA_mels> \
                --input_training_file <absolute_path_of_train_metadata_of_gta_mels> \
                --input_validation_file <absolute_path_of_val_metadata_of_gta_mels> \
                --fine_tuning True

To train V2 or V3 Generator, replace config_v1.json with config_v2.json or config_v3.json.
Checkpoints and copy of the configuration file are saved in cp_hifigan directory by default.
You can change the path by adding --checkpoint_path option.

Here are some example commands that might help you understand the arguments:

python train.py --config config_v1.json \
                --input_wavs_dir ../datasets/ \
                --input_mels_dir ../datasets/ \
                --input_training_file ../datasets/gta_metadata/gta_vctk_22k_train_10s_g2p.txt \
                --input_validation_file ../datasets/gta_metadata/gta_vctk_22k_val_g2p.txt \
                --fine_tuning True

Monitoring via Tensorboard

tensorboard --log_dir cp_hifigan/logs --bind_all

Acknowledgements

We referred to HiFi-GAN, WaveGlow, MelGAN and Tacotron2 to implement this.

wookladin / hifi-gan

Modified HiFi-GAN for Assem-VC

Pretrained Model

Fine-Tuning

Monitoring via Tensorboard

Acknowledgements

About

Languages