toannhu / FloWaveNet

A Pytorch implementation of "FloWaveNet: A Generative Flow for Raw Audio"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

FloWaveNet : A Generative Flow for Raw Audio

This is a PyTorch implementation of our work "FloWaveNet : A Generative Flow for Raw Audio".

For a purpose of parallel sampling, we propose FloWaveNet, a flow-based generative model for raw audio synthesis. FloWaveNet can generate audio samples as fast as ClariNet and Parallel WaveNet, while the training procedure is really easy and stable with a single-stage pipeline. Our generated audio samples are available at Also, our implementation of ClariNet (Gaussian WaveNet and Gaussian IAF) is available at


  • PyTorch 0.4.1
  • Python 3.6
  • Librosa


Step 1. Download Dataset

Step 2. Preprocessing (Preparing Mel Spectrogram)

python --in_dir ljspeech --out_dir DATASETS/ljspeech

Step 3. Train

python --model_name flowavenet --batch_size 8 --n_block 8 --n_flow 6 --n_layer 2 --causal no

Step 4. Synthesize

--load_step CHECKPOINT : the # of the pre-trained model's global training step (also depicted in the trained weight file)

--temp: Temperature (standard deviation) value implemented as z ~ N(0, 1 * TEMPERATURE)

ex) python --model_name flowavenet --n_block 8 --n_flow 6 --n_layer 2 --causal no --load_step 100000 --temp 0.7 --num_samples 10

Sample Link

Sample Link :

Our implementation of ClariNet (Gaussian WaveNet, Gaussian IAF) :

  • Results 1 : Model Comparisons (WaveNet (MoL, Gaussian), ClariNet and FloWaveNet)

  • Results 2 : Temperature effect on Audio Quality Trade-off (Temperature T : 0.0 ~ 1.0, Model : Gaussian IAF and FloWaveNet)

  • Results 3 : Analysis of ClariNet Loss Terms (Loss functions : 1. KLD + Frame Loss 2. Only KL 3. Only Frame)

  • Results 4 : Context Block and Long term Dependency (FloWaveNet : 8 Context Blocks, FloWaveNet_small : 6 Context Blocks)

  • Results 5 : Causality of WaveNet Dilated Convolutions (FloWaveNet : Non-causal WaveNet Affine Coupling Layers, FloWaveNet_causal : Causal WaveNet Affine Coupling Layers)



A Pytorch implementation of "FloWaveNet: A Generative Flow for Raw Audio"

License:MIT License


Language:Python 100.0%