PyTorch Implementation of Non-Parallel Voice Conversion with CycleVAE

Usage

$cd tools
$make
$cd ../egs/one-to-one

open run.sh

set stage=0123 for full feature extraction

$bash run.sh

to compute speaker configs, run with stage=1, then with stage=a, then change accordingly, then run stage=1 again

computed f0 and power histograms will be stored in exp/init_spk_stat

set stage=4 for training

$bash run.sh

Stage details

STAGE 0: data list preparation

STAGE 1: feature extraction

STAGE a: calculation of f0 and power threshold statistics for feature extraction [speaker configs are in conf/]

STAGE 2: calculation of feature statistics for model development

STAGE 3: extraction of converted excitation features for cyclic flow

STAGE 4: model training

STAGE 5: calculation of GV statistics of converted mcep

STAGE 6: decoding and waveform conversion

Trained examples

Example of trained models, converted wavs, and logs can be accessed in trained_example which used speakers SF1 and TF1 from Voice Conversion Challenge (VCC) 2018.

$cd cyclevae-vc_trained/egs/one-to-one/

open run.sh

set stage=5 for GV stat calc.

$bash run.sh

set stage=6 for decoding and wav conversion

$bash run.sh

one of the example of model, converted wavs and logs are located in exp/tr50_22.05k_cyclevae_gauss_VCC2SF1-VCC2TF1_hl1_hu1024_ld32_ks3_ds2_cyc2_lr1e-4_bs80_wd0.0_do0.5_epoch500_bsu1_bsue1/

to summarize training log, use

$sh loss_summary.sh

Soon to be added features

CycleVQVAE
Many-to-Many VC with CycleVAE
Many-to-Many VC with CycleVQVAE

which have been implemented, will be added after finishing the journal

Contact

If there are any questions or problems, especially about hyperparameters and other settings, please let me know.

Patrick Lumban Tobing (Patrick)

patrick.lumbantobing@g.sp.m.is.nagoya-u.ac.jp

Reference

P. L. Tobing, Y.-C. Wu, T. Hayashi, K. Kobayashi, and T. Toda, “Non-parallel voice conversion with cyclic variational autoencoder”, CoRR arXiv preprint arXiv: 1907.10185, 2019. (Accepted for INTERSPEECH 2019)

About

Non-Parallel Voice Conversion with Cyclic Variational Autoencoder

Apache License 2.0

Languages

Language:Python 78.3%Language:Perl 10.0%Language:Shell 8.9%Language:Awk 2.0%Language:Makefile 0.8%