How to resolve “RuntimeError: CUDA error: device-side assert triggered”?

Question

How to resolve “RuntimeError: CUDA error: device-side assert triggered”?

JenniferYingyiWu2020 opened this issue 3 years ago · comments

Hi qiaoguan,
The GitHub project called “Person-reid-GAN-pytorch” is very interested to me. I followed the steps on README.md file, also I have downloaded the dataset “Market-1501”. However, when I execute the command “python train_baseline.py –use_dense”, and I modified the codes due to only one GPU is owned by me,

the following errors have appeared:

"12936
751
/home/jenniferwu/Documents/Python_project/Person-reid-GAN-pytorch-master/model.py:14: UserWarning: nn.init.kaiming_normal is now deprecated in favor of nn.init.kaiming_normal_.
init.kaiming_normal(m.weight.data, a=0, mode='fan_out')
/home/jenniferwu/Documents/Python_project/Person-reid-GAN-pytorch-master/model.py:15: UserWarning: nn.init.constant is now deprecated in favor of nn.init.constant_.
init.constant(m.bias.data, 0.0)
/home/jenniferwu/Documents/Python_project/Person-reid-GAN-pytorch-master/model.py:17: UserWarning: nn.init.normal is now deprecated in favor of nn.init.normal_.
init.normal(m.weight.data, 1.0, 0.02)
/home/jenniferwu/Documents/Python_project/Person-reid-GAN-pytorch-master/model.py:18: UserWarning: nn.init.constant is now deprecated in favor of nn.init.constant_.
init.constant(m.bias.data, 0.0)
/home/jenniferwu/Documents/Python_project/Person-reid-GAN-pytorch-master/model.py:23: UserWarning: nn.init.normal is now deprecated in favor of nn.init.normal_.
init.normal(m.weight.data, std=0.001)
/home/jenniferwu/Documents/Python_project/Person-reid-GAN-pytorch-master/model.py:24: UserWarning: nn.init.constant is now deprecated in favor of nn.init.constant_.
init.constant(m.bias.data, 0.0)
Epoch 0/12
/root/anaconda3/lib/python3.7/site-packages/torch/optim/lr_scheduler.py:136: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
"https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
train_baseline.py:165: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument.
flos=F.log_softmax(input) # NK? batchsize751
train_baseline.py:167: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument.
logpt=F.log_softmax(input) # size: batchsize*751

Traceback (most recent call last):
File "train_baseline.py", line 349, in
num_epochs=13)
File "train_baseline.py", line 251, in train_model
loss.backward()
File "/root/anaconda3/lib/python3.7/site-packages/torch/tensor.py", line 221, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/root/anaconda3/lib/python3.7/site-packages/torch/autograd/init.py", line 132, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: CUDA error: device-side assert triggered"

So, would you pls help to give me some suggestions on how to resolve “RuntimeError: CUDA error: device-side assert triggered”? Many Thanks!