This code implements the Subscale part of the WaveRNN paper on top of Fatchord's original implementation.
Please refer to this accompanying blogpost for details of our interpretation: Subscale WaveRNN
Original publication: Efficient Neural Audio Synthesis
Initial implementation: Fatchord's Repo
Samples after training for 1M iterations on the Sharvard dataset with current hparams: samples
To train the model, use the command below:
CUDA_VISIBLE_DEVICES={your_gpus} python train.py --data ../data/{your_dataset} --expName {your_experiment}
THE TRAINING SCRIPT IS MULTI-GPU BY DEFAULT, TO USE SINGLE GPUS PLEASE SET CUDA_VISIBLE_DEVICES
TO YOUR PREFERRED GPU
The {your_dataset}
folder is assumed to have the following folder structure:
{args.data}/train/mel
;{args.data}/train/wav_24khz
;{args.data}/valid/mel
;{args.data}/valid/wav_24khz
If you are providing the features, the data loader will create a set based on the intersection of filenames present
in the mel
folder and the wav_24khz
folder.
You can download a version of the Sharvard dataset with this folder structure here
Extract with:
tar xvf Sharvard.tar.gz
To use within WaveRNN
: tensorboard --logdir tensorboard-runs --port {MY PORT}
.
Using gen_wavs.py
:
CUDA_VISIBLE_DEVICES={} python gen_wavs.py --data {mel_directory} --checkpoint {checkpoint_path} --out_dir {out_folder}
The most fun hyperparameters to play around with are the three subscale parameters:
- batch_factor
- horizon
- lookback
As well as tweaking the Condition Network. Any issues or cool findings let us know!