Vocos

My implementation of Vocos(paper) for JSUT(link) powerd by lightning.

Requirements

pip install torch torchaudio lightning pandas matplotlib

or

docker image build -t vocos -f docker/Dockerfile .
docker container run --rm -it --gpus all -v $(pwd):/work vocos

Running run.sh will automatically download the data and begin training.
So just execute the following commands to begin training.

cd scripts
./run.sh

synthesize.sh uses last.ckpt by default, so if you want to use a specific weight, change it.

cd scripts
./synthesis.sh

Trained model is in following link.

It contains model weights as well as some training info.

Some audio samples are in asset/sample.

My implementation of Vocos for comparison.

Language:Python 90.2%Language:Shell 8.3%Language:Dockerfile 1.5%