kan-bayashi / PytorchWaveNetVocoder

WaveNet-Vocoder implementation with pytorch.

Home Page:https://kan-bayashi.github.io/WaveNetVocoderSamples/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Trying to train on nancy corpus subset

ghostcow opened this issue · comments

Hi,

I'm trying to train a model on 200 wav files (100 train/val) from the nancy corpus (Blizzard 2011 dataset). I modified the egs/arctic/sd/run.sh script to process my own files.

I get an error due to some issues with size of batch tensor, perhaps something having to do with upsampling?

Here is the full error log:

# train.py --n_gpus 1 --waveforms data/train/wav_ns.scp --feats data/train/feats.scp --stats data/train/stats.h5 --expdir exp/tr_nancy_16k_sd_nancy_lr1e-4_wd0.0_bl20000_bs1_ns_up --n_quantize 256 --n_aux 28 --n_resch 512 --n_skipch 256 --dilation_depth 10 --dilation_repeat 3 --lr 1e-4 --weight_decay 0.0 --iters 200000 --batch_length 20000 --batch_size 1 --checkpoints 10000 --use_speaker_code false --upsampling_factor 80 --resume
# Started at Wed Mar 14 20:48:13 UTC 2018
#
/home/ubuntu/PytorchWaveNetVocoder/tools/venv/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
WaveNet(
  (onehot): OneHot(
  )
  (causal): CausalConv1d(
    (conv): Conv1d(256, 512, kernel_size=(2,), stride=(1,), padding=(1,))
  )
  (upsampling): UpSampling(
    (conv): ConvTranspose2d(1, 1, kernel_size=(1, 80), stride=(1, 80))
  )
  (dil_sigmoid): ModuleList(
    (0): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(1,))
    )
    (1): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(2,), dilation=(2,))
    )
    (2): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(4,), dilation=(4,))
    )
    (3): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(8,), dilation=(8,))
    )
    (4): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(16,), dilation=(16,))
    )
    (5): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(32,), dilation=(32,))
    )
    (6): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(64,), dilation=(64,))
    )
    (7): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(128,), dilation=(128,))
    )
    (8): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(256,), dilation=(256,))
    )
    (9): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(512,), dilation=(512,))
    )
    (10): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(1,))
    )
    (11): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(2,), dilation=(2,))
    )
    (12): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(4,), dilation=(4,))
    )
    (13): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(8,), dilation=(8,))
    )
    (14): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(16,), dilation=(16,))
    )
    (15): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(32,), dilation=(32,))
    )
    (16): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(64,), dilation=(64,))
    )
    (17): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(128,), dilation=(128,))
    )
    (18): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(256,), dilation=(256,))
    )
    (19): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(512,), dilation=(512,))
    )
    (20): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(1,))
    )
    (21): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(2,), dilation=(2,))
    )
    (22): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(4,), dilation=(4,))
    )
    (23): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(8,), dilation=(8,))
    )
    (24): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(16,), dilation=(16,))
    )
    (25): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(32,), dilation=(32,))
    )
    (26): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(64,), dilation=(64,))
    )
    (27): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(128,), dilation=(128,))
    )
    (28): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(256,), dilation=(256,))
    )
    (29): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(512,), dilation=(512,))
    )
  )
  (dil_tanh): ModuleList(
    (0): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(1,))
    )
    (1): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(2,), dilation=(2,))
    )
    (2): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(4,), dilation=(4,))
    )
    (3): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(8,), dilation=(8,))
    )
    (4): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(16,), dilation=(16,))
    )
    (5): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(32,), dilation=(32,))
    )
    (6): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(64,), dilation=(64,))
    )
    (7): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(128,), dilation=(128,))
    )
    (8): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(256,), dilation=(256,))
    )
    (9): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(512,), dilation=(512,))
    )
    (10): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(1,))
    )
    (11): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(2,), dilation=(2,))
    )
    (12): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(4,), dilation=(4,))
    )
    (13): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(8,), dilation=(8,))
    )
    (14): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(16,), dilation=(16,))
    )
    (15): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(32,), dilation=(32,))
    )
    (16): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(64,), dilation=(64,))
    )
    (17): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(128,), dilation=(128,))
    )
    (18): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(256,), dilation=(256,))
    )
    (19): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(512,), dilation=(512,))
    )
    (20): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(1,))
    )
    (21): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(2,), dilation=(2,))
    )
    (22): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(4,), dilation=(4,))
    )
    (23): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(8,), dilation=(8,))
    )
    (24): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(16,), dilation=(16,))
    )
    (25): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(32,), dilation=(32,))
    )
    (26): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(64,), dilation=(64,))
    )
    (27): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(128,), dilation=(128,))
    )
    (28): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(256,), dilation=(256,))
    )
    (29): CausalConv1d(
      (conv): Conv1d(512, 512, kernel_size=(2,), stride=(1,), padding=(512,), dilation=(512,))
    )
  )
  (aux_1x1_sigmoid): ModuleList(
    (0): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (1): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (2): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (3): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (4): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (5): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (6): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (7): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (8): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (9): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (10): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (11): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (12): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (13): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (14): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (15): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (16): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (17): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (18): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (19): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (20): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (21): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (22): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (23): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (24): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (25): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (26): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (27): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (28): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (29): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
  )
  (aux_1x1_tanh): ModuleList(
    (0): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (1): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (2): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (3): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (4): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (5): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (6): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (7): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (8): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (9): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (10): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (11): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (12): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (13): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (14): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (15): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (16): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (17): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (18): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (19): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (20): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (21): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (22): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (23): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (24): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (25): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (26): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (27): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (28): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
    (29): Conv1d(28, 512, kernel_size=(1,), stride=(1,))
  )
  (skip_1x1): ModuleList(
    (0): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
    (1): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
    (2): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
    (3): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
    (4): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
    (5): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
    (6): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
    (7): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
    (8): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
    (9): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
    (10): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
    (11): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
    (12): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
    (13): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
    (14): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
    (15): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
    (16): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
    (17): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
    (18): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
    (19): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
    (20): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
    (21): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
    (22): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
    (23): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
    (24): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
    (25): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
    (26): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
    (27): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
    (28): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
    (29): Conv1d(512, 256, kernel_size=(1,), stride=(1,))
  )
  (res_1x1): ModuleList(
    (0): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
    (1): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
    (2): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
    (3): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
    (4): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
    (5): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
    (6): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
    (7): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
    (8): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
    (9): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
    (10): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
    (11): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
    (12): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
    (13): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
    (14): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
    (15): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
    (16): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
    (17): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
    (18): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
    (19): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
    (20): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
    (21): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
    (22): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
    (23): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
    (24): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
    (25): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
    (26): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
    (27): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
    (28): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
    (29): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
  )
  (conv_post_1): Conv1d(256, 256, kernel_size=(1,), stride=(1,))
  (conv_post_2): Conv1d(256, 256, kernel_size=(1,), stride=(1,))
)
number of training data = 100.
batch length is decreased due to upsampling (20000 -> 19970)
Traceback (most recent call last):
  File "../../../src/bin/train.py", line 513, in <module>
    main()
  File "../../../src/bin/train.py", line 474, in main
    batch_output = model(batch_x, batch_h)
  File "/home/ubuntu/PytorchWaveNetVocoder/tools/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/PytorchWaveNetVocoder/src/nets/wavenet.py", line 237, in forward
    self.skip_1x1[l], self.res_1x1[l])
  File "/home/ubuntu/PytorchWaveNetVocoder/src/nets/wavenet.py", line 511, in _residual_forward
    aux_output_sigmoid = aux_1x1_sigmoid(h)
  File "/home/ubuntu/PytorchWaveNetVocoder/tools/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/PytorchWaveNetVocoder/tools/venv/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 168, in forward
    self.padding, self.dilation, self.groups)
  File "/home/ubuntu/PytorchWaveNetVocoder/tools/venv/lib/python3.6/site-packages/torch/nn/functional.py", line 54, in conv1d
    return f(input, weight, bias)
RuntimeError: Given groups=1, weight[512, 28, 1], so expected input[1, 64, 23040] to have 28 channels, but got 64 channels instead
# Accounting: time=7 threads=1
# Ended (code 1) at Wed Mar 14 20:48:20 UTC 2018, elapsed time 7 seconds

What could be the problem?

changing n_aux hyperparameter to 64 solved the problem. for now.