PyTorch Implementation of Non-Parallel Voice Conversion with CycleVAE

Further developments

Many-to-Many VC with CycleVAE
Many-to-Many VC with CycleVQVAE
Spectral and excitation modeling
2-dimensional speaker space for speaker interpolation
Waveform generation with neural vocoder
Mel-spectrogram modeling

All further developments are moved and being integrated in the following repo: https://github.com/patrickltobing/cyclevae-vc-neuralvoco

Thanks!

Usage

$cd tools
$make
$cd ../egs/one-to-one

open run.sh

set stage=0123 for full feature extraction

$bash run.sh

to compute speaker configs, run with stage=1, then with stage=a, then change accordingly, then run stage=1 again

computed f0 and power histograms will be stored in exp/init_spk_stat

set stage=4 for training

$bash run.sh

Stage details

STAGE 0: data list preparation

STAGE 1: feature extraction

STAGE a: calculation of f0 and power threshold statistics for feature extraction [speaker configs are in conf/]

STAGE 2: calculation of feature statistics for model development

STAGE 3: extraction of converted excitation features for cyclic flow

STAGE 4: model training

STAGE 5: calculation of GV statistics of converted mcep

STAGE 6: decoding and waveform conversion

Trained examples

Example of trained models, converted wavs, and logs can be accessed in trained_example which used speakers SF1 and TF1 from Voice Conversion Challenge (VCC) 2018.

$cd cyclevae-vc_trained/egs/one-to-one/

open run.sh

set stage=5 for GV stat calc.

$bash run.sh

set stage=6 for decoding and wav conversion

$bash run.sh

one of the example of model, converted wavs and logs are located in exp/tr50_22.05k_cyclevae_gauss_VCC2SF1-VCC2TF1_hl1_hu1024_ld32_ks3_ds2_cyc2_lr1e-4_bs80_wd0.0_do0.5_epoch500_bsu1_bsue1/

to summarize training log, use

$sh loss_summary.sh

Contact

If there are any questions or problems, especially about hyperparameters and other settings, please let me know.

Patrick Lumban Tobing (Patrick)

patrick.lumbantobing@g.sp.m.is.nagoya-u.ac.jp

Reference

P. L. Tobing, Y.-C. Wu, T. Hayashi, K. Kobayashi, and T. Toda, “Non-parallel voice conversion with cyclic variational autoencoder”, CoRR arXiv preprint arXiv: 1907.10185, 2019. (Accepted for INTERSPEECH 2019)

About

Non-Parallel Voice Conversion with Cyclic Variational Autoencoder

Apache License 2.0

Languages

Language:Python 78.3%Language:Perl 10.0%Language:Shell 8.9%Language:Awk 2.0%Language:Makefile 0.8%