voice-conversion adversarial-speaker-recognition speaker-encoder speaker-identity vctk

Improving the Speaker Identity of Non-Parallel Many-to-Many Voice Conversion with Adversarial Speaker Recognition

Code for this paper Improving the Speaker Identity of Non-Parallel Many-to-Many Voice Conversion with Adversarial Speaker Recognition

Shaojin Ding, Guanlong Zhao, Ricardo Gutierrez-Osuna

Accepted by INTERSPEECH 2020

This is a TensorFlow + Pytorch implementation. This implementation is adapted from the Real Time Voice Clone implementation at https://github.com/CorentinJ/Real-Time-Voice-Cloning.

Dataset:

VCTK
- Audio samples.
- Trained model.

Requirements

Python 3.7 or newer
PyTorch with CUDA enabled
TensorFlow 1.13.1
Run pip install -r requirements.txt

Data preprocessing

We use the speaker encoder model and vocoder model from here. We only train the voice conversion model (i.e., synthesizer).

Before running, put the speaker encoder and vocoder at encoder/saved_models/pretrained.pt and vocoder/saved_models/pretrained/pretrained.pt

Download and uncompress the VCTK dataset.
Manually split the train and test set (there is no official data split). Put them as <dataset_root>/VCTK/train/p227 and <dataset_root>/VCTK/test/p228
Run python synthesizer_preprocess_audio.py <datasets_root>
Run python synthesizer_preprocess_embeds.py <datasets_root>/SV2TTS/synthesizer_train
Run python synthesizer_preprocess_embeds.py <datasets_root>/SV2TTS/synthesizer_test

Training and inference

To launch training:

$ python synthesizer_train.py vc_adversarial <datasets_root>/SV2TTS/synthesizer_train

To run inference, use synthesis_ppg_script.py. Change the syn_dir to the path of the trained model, e.g., synthesizer/saved_models/logs-train_adversarial_vctk/taco_pretrained

Acknowledgement

The code is adapted from CorentinJ / Real-Time-Voice-Cloning.

Cite the work

@article{dingimproving,
  title={Improving the Speaker Identity of Non-Parallel Many-to-Many Voice Conversion with Adversarial Speaker Recognition},
  author={Ding, Shaojin and Zhao, Guanlong and Gutierrez-Osuna, Ricardo}
}

About

[InterSpeech 2020] "Improving the Speaker Identity of Non-Parallel Many-to-Many VoiceConversion with Adversarial Speaker Recognition" by Shaojin Ding, Guanlong Zhao, Ricardo Gutierrez-Osuna

voice-conversion adversarial-speaker-recognition speaker-encoder speaker-identity vctk

Other

Languages

Language:Python 100.0%