minzwon / sota-music-tagging-models

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Problems training on jamendo - RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0

mathigatti opened this issue · comments

Hi, thanks for this awesome project! I'm trying to train it with jamendo-moodtheme tags but I'm getting an error.

I'm trying it on a google colab VM with a cuda enabled GPU.

I downloaded the mel-spectrograms from the jamendo repository specyfing the melspecs data type and autotagging_moodtheme dataset. Then in this project I just replaced the TAGS variables in the code with this and the tsv files with the moodtheme ones from here.

Everything looked fine but for some reason I'm receiving the attached error after running the training code.

The mel spectrograms have 92 bands and different lengths, that might be causing problems maybe?

Let me know if anyone knows what might be the problem :)

Thanks in advance!

# My code
%tensorflow_version 1.x
%cd /content/sota-music-tagging-models/src/
!python -u main.py --data_path /content/data --dataset jamendo-mood

My error message

Namespace(batch_size=16, data_path='/content/data', dataset='jamendo-mood', log_step=20, lr=0.0001, model_load_path='.', model_save_path='./../models', model_type='hcnn', n_epochs=200, num_workers=0, use_tensorboard=1)
Traceback (most recent call last):
  File "main.py", line 61, in <module>
    main(config)
  File "main.py", line 39, in main
    solver.train()
  File "/content/sota-music-tagging-models/src/solver.py", line 172, in train
    for x, y in self.data_loader:
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 346, in __next__
    data = self.dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch
    return self.collate_fn(data)
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/collate.py", line 80, in default_collate
    return [default_collate(samples) for samples in transposed]
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/collate.py", line 80, in <listcomp>
    return [default_collate(samples) for samples in transposed]
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/collate.py", line 65, in default_collate
    return default_collate([torch.as_tensor(b) for b in batch])
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/collate.py", line 56, in default_collate
    return torch.stack(batch, 0, out=out)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 20546 and 9168 in dimension 2 at /pytorch/aten/src/TH/generic/THTensor.cpp:689

Hi @mathigatti
This error is because you have different lengths of audio in a single batch. As the error says, the data loader got "20546 and 9168 in dimension 2". You need to crop them to have a same length.

Before that, let me check this first. It looks like you are trying to use Mel spectrogram inputs. Implemented models in this repository use raw audio inputs and it extracts Mel spectrograms on-the-fly. So, please use raw audio inputs.

If you want to use Mel spectrogram inputs with your own data loader, you need to modify model.py. You can simply remove self.spec and self.to_db from model.py.

It worked perfectly after downloading the mp3 files thank you very much!!