Unofficial implementation of ConvNeXt-TTS(paper) for my experiment.
The model architecture has been slightly modified.
- Install dependencies using Rye(link).
- Download JSUT corpus and fullcontext label(link) and then sample wave files(basic5000) to 24kHz.
- Create a
default.yaml
file under theconvnext_tts/bin/conf/path
directory, settingwav_dir
,lab_dir
anddata_root
according to your environment, usingsrc/convnext_tts/bin/conf/path/dummy.yaml
as a reference. - Run
exp/jsut/run.sh
.
Now I'm running it, but it seems likely to fail.
The training of WaveNeXt, which is the vocoder module of this model, seems to be more challenging than that of vocos, which is why this model cannot achieve stable training....
Still under development...