Demo outputs:

audio.mp4

Instructions

Create a virtual environment python -m venv ./venv
open demo.ipynb and install the dependencies
move files to opencc lib after installing opencc
1. mv ./opencc/jyutjyu.json ./venv/lib/site-packages/opencc/clib/share/opencc/jyutjyu.json
2. mv ./opencc/jyutjyu.ocd2 ./venv/lib/site-packages/opencc/clib/share/opencc/jyutjyu.ocd2
build cpython codes cd vits/monotonic_align python setup.py build_ext --inplace
download models
1. https://huggingface.co/xiaomaiiwn/vits-cantonese/blob/main/model/G.pth

I do not own the models and you should follow the license of the models.

Cantonese Text to Speech with VITS implementation

MIT License

Language:Python 63.2%Language:Jupyter Notebook 34.3%Language:C++ 1.8%Language:Cython 0.5%Language:Dockerfile 0.2%