My implementation of iSTFTNet(paper) for JSUT(link) powerd by lightning.
Running run.sh will automatically download the data and begin training.
So just execute the following commands to begin training.
cd scripts
./run.sh
synthesize.sh uses last.ckpt by default, so if you want to use a specific weight, change it.
cd scripts
./synthesis.sh
pip install torch torchaudio lightning pandas
Trained 1000 epochs(612000 steps) with batch_size = 16.
Pretrained model ckpt is here. https://huggingface.co/reppy4620/istft_net_jsut/blob/main/jsut_1000.ckpt
Some audio samples are in asset/sample/
loss | plot |
---|---|
Discriminator | ![]() |
Generator | ![]() |
Feature Matching | ![]() |
Mel | ![]() |