QuickVC : HuBERT-VITS-MSiSTFTNet Voice Conversion

Clone of the official QuickVC implementation.
Official demo.

main branch: Refactored & improved
minfast branch: Fast training with minumum change

Pretrained Model

Put pretrained model into logs/quickvc

Inference with pretrained model

python convert.py

You can change convert.txt to select the target and source

Preprocess

Hubert-Soft

cd dataset
python encode.py soft dataset/vctk-16k dataset/vctk-16k

Spectrogram resize data augumentation, please refer to FreeVC.

Train

python train.py

If you want to change the config and model name, change:

parser.add_argument('-c', '--config', type=str, default="./configs/quickvc.json",help='JSON file for configuration')
parser.add_argument('-m', '--model', type=str,default="quickvc",help='Model name')

in utils.py

Info from official repository

Naturalness has Language dependency (c.f. SoftVC) issue#4
Training time: 1~2week on RTX3090 x1 issue#6

References

Original paper

@misc{2302.08296,
Author = {Houjian Guo and Chaoran Liu and Carlos Toshinori Ishi and Hiroshi Ishiguro},
Title = {QuickVC: Any-to-many Voice Conversion Using Inverse Short-time Fourier Transform for Faster Conversion},
Year = {2023},
Eprint = {arXiv:2302.08296},
}

Acknowlegements

MS-ISTFT-VITS: Decoder
Soft-VC: PriorEncoder's Hubert-soft
FreeVC: data augumentation

About

QuickVC: Any-to-many Voice Conversion Using Inverse Short-time Fourier Transform for Faster Conversion

MIT License

Languages

Language:Python 91.6%Language:Jupyter Notebook 7.6%Language:Dockerfile 0.8%