wookladin / hifi-gan

Modified HiFi-GAN for Assem-VC

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Modified HiFi-GAN for Assem-VC

Pretrained Model

To GTA finetune HiFi-GAN models, you should download Pretrained models and transfer from those weight.

You can use pretrained UNIVERSAL_V1 models authors of HiFi-GAN provide.
Download pretrained models
Details of each folder are as in follows:

Folder Name Generator Dataset Fine-Tuned
LJ_V1 V1 LJSpeech No
LJ_V2 V2 LJSpeech No
LJ_V3 V3 LJSpeech No
LJ_FT_T2_V1 V1 LJSpeech Yes (Tacotron2)
LJ_FT_T2_V2 V2 LJSpeech Yes (Tacotron2)
LJ_FT_T2_V3 V3 LJSpeech Yes (Tacotron2)
VCTK_V1 V1 VCTK No
VCTK_V2 V2 VCTK No
VCTK_V3 V3 VCTK No
UNIVERSAL_V1 V1 Universal No
  1. make cp_hifigan directory.
    mkdir cp_hifigan
  2. Download g_02500000 and do_02500000 from following link
  3. place them in cp_hifigan/ directory.

Fine-Tuning

  1. Generate GTA mel-spectrograms in torch.Tensor format using Assem-VC.
    The file name of the generated mel-spectrogram should match the audio file and the extension should be .gta.
    Example:

    Audio File : p233_392.wav
    Mel-Spectrogram File : p233_392.wav.gta
    
  2. Run the following command.

    python train.py --config config_v1.json \
                    --input_wavs_dir <root_path_of_input_audios> \
                    --input_mels_dir <root_path_of_GTA_mels> \
                    --input_training_file <absolute_path_of_train_metadata_of_gta_mels> \
                    --input_validation_file <absolute_path_of_val_metadata_of_gta_mels> \
                    --fine_tuning True

    To train V2 or V3 Generator, replace config_v1.json with config_v2.json or config_v3.json.
    Checkpoints and copy of the configuration file are saved in cp_hifigan directory by default.
    You can change the path by adding --checkpoint_path option.

    Here are some example commands that might help you understand the arguments:

    python train.py --config config_v1.json \
                    --input_wavs_dir ../datasets/ \
                    --input_mels_dir ../datasets/ \
                    --input_training_file ../datasets/gta_metadata/gta_vctk_22k_train_10s_g2p.txt \
                    --input_validation_file ../datasets/gta_metadata/gta_vctk_22k_val_g2p.txt \
                    --fine_tuning True

Monitoring via Tensorboard

tensorboard --log_dir cp_hifigan/logs --bind_all

Acknowledgements

We referred to HiFi-GAN, WaveGlow, MelGAN and Tacotron2 to implement this.

About

Modified HiFi-GAN for Assem-VC

License:MIT License


Languages

Language:Python 100.0%