My implementation of Vocos(paper) for JSUT(link) powerd by lightning.
pip install torch torchaudio lightning pandas matplotlib
or
docker image build -t vocos -f docker/Dockerfile .
docker container run --rm -it --gpus all -v $(pwd):/work vocos
Running run.sh will automatically download the data and begin training.
So just execute the following commands to begin training.
cd scripts
./run.sh
synthesize.sh uses last.ckpt by default, so if you want to use a specific weight, change it.
cd scripts
./synthesis.sh
Trained model is in following link.
https://huggingface.co/reppy4620/vocos/blob/main/jsut_1000.ckpt
It contains model weights as well as some training info.
Some audio samples are in asset/sample
.
loss | plot |
---|---|
Discriminator | |
Generator | |
Feature Matching | |
Mel |