minzwon / sota-music-tagging-models

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Bug running fcn with mtat

Sette opened this issue · comments

result = self.forward(*input, **kwargs)
RuntimeError: builtins: link error: Invalid value
The above operation failed in interpreter, with the following stack trace:

The above operation failed in interpreter, with the following stack trace:

Any idea what the problem is?

Can you share the entire code that you run and the entire error message, please? With this, I can't understand which part returned the error.

Namespace(batch_size=16, data_path='/home/bruno/data', dataset='mtat', log_step=20, lr=0.0001, model_load_path='.', model_save_path='./../models', model_type='fcn', n_epochs=200, num_workers=0, use_tensorboard=1)
Traceback (most recent call last):
File "main.py", line 59, in
main(config)
File "main.py", line 37, in main
solver.train()
File "/home/bruno/git/sota-music-tagging-models/training/solver.py", line 169, in train
out = self.model(x)
File "/home/bruno/anaconda3/envs/sota/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/bruno/git/sota-music-tagging-models/training/model.py", line 51, in forward
x = self.to_db(x)
File "/home/bruno/anaconda3/envs/sota/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
RuntimeError: builtins: link error: Invalid value
The above operation failed in interpreter, with the following stack trace:

The above operation failed in interpreter, with the following stack trace:

Can you add these three lines in solver.py before out = self.model(x) in line 169?

print(x.shape)
print(type(x))
print(type(x[0][0][0]))

Then please share what they return. It looks like an input error.
Also, please double-check if your library versions are identical to the requirements.txt.

Output:
torch.Size([16, 464000])
<class 'torch.Tensor'>
Traceback (most recent call last):
File "main.py", line 59, in
main(config)
File "main.py", line 37, in main
solver.train()
File "/home/bruno/git/sota-music-tagging-models/training/solver.py", line 171, in train
print(type(x[0][0][0]))
IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number

About requirements. I have a bug with pytorch 1.2:
ERROR: Could not find a version that satisfies the requirement torch==1.2.0 (from -r requirements.txt (line 20)) (from versions: 1.4.0, 1.5.0, 1.5.1, 1.6.0, 1.7.0, 1.7.1, 1.8.0, 1.8.1, 1.9.0, 1.9.1, 1.10.0, 1.10.1)
ERROR: No matching distribution found for torch==1.2.0 (from -r requirements.txt (line 20))

Okay, then remove those three lines from solver.py. And paste those three lines in model.py line 51 before x = self.to_db(x). What do they return?

Yeah, maybe 1.2.0 is too old. What is the version of your torchaudio?

Output:
torch.Size([16, 96, 1813])
<class 'torch.Tensor'>
<class 'torch.Tensor'>

torchaudio version is 0.3.0

Okay, the input shape looks fine.
There are two more reasons that I suspect.

  1. Please check if the input includes Inf or NaN. Remove the previous 3 lines and paste the following.
    print(np.isnan(x).any())
    print(np.isinf(x).any())

  2. Does this happen no matter you use CPU or GPU? Sometimes it returns invalid value error because of the CUDA configuration. Please check if this happens when you use your CPU.

How i can run with CPU?

You can control it in solver.py.

x.cpu() will send your input to CPU and self.model.cpu() will send your model to CPU. Try them in line 165.

I run it with the CPU and it worked. I believe it is some configuration of cuda and CUDNN.

Yes, then you need to check your CUDA configuration.