indexalice / cyclevae-vc

Non-Parallel Voice Conversion with Cyclic Variational Autoencoder

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PyTorch Implementation of Non-Parallel Voice Conversion with CycleVAE


Usage

$cd tools
$make
$cd ../egs/one-to-one

open run.sh

set stage=0123 for full feature extraction

$bash run.sh

to compute speaker configs, run with stage=1, then with stage=a, then change accordingly, then run stage=1 again

computed f0 and power histograms will be stored in exp/init_spk_stat

set stage=4 for training

$bash run.sh

Stage details

STAGE 0: data list preparation

STAGE 1: feature extraction

STAGE a: calculation of f0 and power threshold statistics for feature extraction [speaker configs are in conf/]

STAGE 2: calculation of feature statistics for model development

STAGE 3: extraction of converted excitation features for cyclic flow

STAGE 4: model training

STAGE 5: calculation of GV statistics of converted mcep

STAGE 6: decoding and waveform conversion


Trained examples

Example of trained models, converted wavs, and logs can be accessed in trained_example which used speakers SF1 and TF1 from Voice Conversion Challenge (VCC) 2018.

$cd cyclevae-vc_trained/egs/one-to-one/

open run.sh

set stage=5 for GV stat calc.

$bash run.sh

set stage=6 for decoding and wav conversion

$bash run.sh

one of the example of model, converted wavs and logs are located in exp/tr50_22.05k_cyclevae_gauss_VCC2SF1-VCC2TF1_hl1_hu1024_ld32_ks3_ds2_cyc2_lr1e-4_bs80_wd0.0_do0.5_epoch500_bsu1_bsue1/

to summarize training log, use

$sh loss_summary.sh

Soon to be added features

  • CycleVQVAE
  • Many-to-Many VC with CycleVAE
  • Many-to-Many VC with CycleVQVAE

which have been implemented, will be added after finishing the journal


Contact

If there are any questions or problems, especially about hyperparameters and other settings, please let me know.

Patrick Lumban Tobing (Patrick)

patrick.lumbantobing@g.sp.m.is.nagoya-u.ac.jp


Reference

P. L. Tobing, Y.-C. Wu, T. Hayashi, K. Kobayashi, and T. Toda, “Non-parallel voice conversion with cyclic variational autoencoder”, CoRR arXiv preprint arXiv: 1907.10185, 2019. (Accepted for INTERSPEECH 2019)

About

Non-Parallel Voice Conversion with Cyclic Variational Autoencoder

License:Apache License 2.0


Languages

Language:Python 78.3%Language:Perl 10.0%Language:Shell 8.9%Language:Awk 2.0%Language:Makefile 0.8%