PyTorch Implementation of Non-Parallel Voice Conversion with CycleVAE
Usage
$cd tools
$make
$cd ../egs/one-to-one
open run.sh
set stage=0123 for full feature extraction
$bash run.sh
to compute speaker configs, run with stage=1, then with stage=a, then change accordingly, then run stage=1 again
computed f0 and power histograms will be stored in exp/init_spk_stat
set stage=4 for training
$bash run.sh
Stage details
STAGE 0: data list preparation
STAGE 1: feature extraction
STAGE a: calculation of f0 and power threshold statistics for feature extraction [speaker configs are in conf/]
STAGE 2: calculation of feature statistics for model development
STAGE 3: extraction of converted excitation features for cyclic flow
STAGE 4: model training
STAGE 5: calculation of GV statistics of converted mcep
STAGE 6: decoding and waveform conversion
Trained examples
Example of trained models, converted wavs, and logs can be accessed in trained_example which used speakers SF1 and TF1 from Voice Conversion Challenge (VCC) 2018.
$cd cyclevae-vc_trained/egs/one-to-one/
open run.sh
set stage=5 for GV stat calc.
$bash run.sh
set stage=6 for decoding and wav conversion
$bash run.sh
one of the example of model, converted wavs and logs are located in exp/tr50_22.05k_cyclevae_gauss_VCC2SF1-VCC2TF1_hl1_hu1024_ld32_ks3_ds2_cyc2_lr1e-4_bs80_wd0.0_do0.5_epoch500_bsu1_bsue1/
to summarize training log, use
$sh loss_summary.sh
Soon to be added features
- CycleVQVAE
- Many-to-Many VC with CycleVAE
- Many-to-Many VC with CycleVQVAE
which have been implemented, will be added after finishing the journal
Contact
If there are any questions or problems, especially about hyperparameters and other settings, please let me know.
Patrick Lumban Tobing (Patrick)
patrick.lumbantobing@g.sp.m.is.nagoya-u.ac.jp
Reference
P. L. Tobing, Y.-C. Wu, T. Hayashi, K. Kobayashi, and T. Toda, “Non-parallel voice conversion with cyclic variational autoencoder”, CoRR arXiv preprint arXiv: 1907.10185, 2019. (Accepted for INTERSPEECH 2019)