- Many-to-Many VC with CycleVAE
- Many-to-Many VC with CycleVQVAE
- Spectral and excitation modeling
- 2-dimensional speaker space for speaker interpolation
- Waveform generation with neural vocoder
- Mel-spectrogram modeling
All further developments are moved and being integrated in the following repo: https://github.com/patrickltobing/cyclevae-vc-neuralvoco
Thanks!
$cd tools
$make
$cd ../egs/one-to-one
open run.sh
set stage=0123 for full feature extraction
$bash run.sh
to compute speaker configs, run with stage=1, then with stage=a, then change accordingly, then run stage=1 again
computed f0 and power histograms will be stored in exp/init_spk_stat
set stage=4 for training
$bash run.sh
STAGE 0: data list preparation
STAGE 1: feature extraction
STAGE a: calculation of f0 and power threshold statistics for feature extraction [speaker configs are in conf/]
STAGE 2: calculation of feature statistics for model development
STAGE 3: extraction of converted excitation features for cyclic flow
STAGE 4: model training
STAGE 5: calculation of GV statistics of converted mcep
STAGE 6: decoding and waveform conversion
Example of trained models, converted wavs, and logs can be accessed in trained_example which used speakers SF1 and TF1 from Voice Conversion Challenge (VCC) 2018.
$cd cyclevae-vc_trained/egs/one-to-one/
open run.sh
set stage=5 for GV stat calc.
$bash run.sh
set stage=6 for decoding and wav conversion
$bash run.sh
one of the example of model, converted wavs and logs are located in exp/tr50_22.05k_cyclevae_gauss_VCC2SF1-VCC2TF1_hl1_hu1024_ld32_ks3_ds2_cyc2_lr1e-4_bs80_wd0.0_do0.5_epoch500_bsu1_bsue1/
to summarize training log, use
$sh loss_summary.sh
If there are any questions or problems, especially about hyperparameters and other settings, please let me know.
Patrick Lumban Tobing (Patrick)
patrick.lumbantobing@g.sp.m.is.nagoya-u.ac.jp
P. L. Tobing, Y.-C. Wu, T. Hayashi, K. Kobayashi, and T. Toda, “Non-parallel voice conversion with cyclic variational autoencoder”, CoRR arXiv preprint arXiv: 1907.10185, 2019. (Accepted for INTERSPEECH 2019)