RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 136 and 135 in dimension 2 at /opt/conda/conda-bld/pytorch_1532581333611/work/aten/src/THC/generic/THCTensorMath.cu:87`

Question

RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 136 and 135 in dimension 2 at /opt/conda/conda-bld/pytorch_1532581333611/work/aten/src/THC/generic/THCTensorMath.cu:87`

tokyokuma opened this issue 6 years ago · comments

Thank you for developing ESPNet!
I have three questions

・ About labels to be ignored
My own dataset has 11 classes except the background. And we assigned 255 labels to the background.
So in Dataset.py and loaddata.py
label_img [label_img == 255] = 19
label_img [label_img == 255] = 11
Written and executed
CUDA_VISIBLE_DEVICES = 0, 1, 2, 3 python main.py - data_dir ./izunuma - classes 11 - batch_size 10 - s 1.0

Labels can take value between 0 and number of classes 10.
You have following values as class labels:
[0 1 11]
Some problem with labels. Please check image file

I encountered this error.
To solve this, include the background in the number of classes, assign 12 to the --classes argument The solution is correct.

・About errors during learning
I set the number of classes in the above method and started training, the following error occurred.
Image size width 640 height 360
~/github/ESPNetv2/segmentation$ CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --data_dir ./izunuma --classes 12 --batch_size 10 --s 1.0
Total network parameters: 338430
Data statistics
[131.47914 144.9144 134.75436] [76.80522 68.83018 71.792274]
[ 9.698015 9.98296 7.912603 8.275558 3.726631 10.492059
10.192185 4.4507203 10.4207115 10.338895 10.329051 1.9822153]
Learning rate: 0.0005
Traceback (most recent call last):
File "main.py", line 263, in
trainValidateSegmentation(parser.parse_args())
File "main.py", line 200, in trainValidateSegmentation
train(args, trainLoader_scale1, model, criteria, optimizer, epoch)
File "/home/nouki/github/ESPNetv2/segmentation/train_utils.py", line 89, in train
output1, output2 = model(input)
File "/home/nouki/.pyenv/versions/anaconda3-5.3.1/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/nouki/.pyenv/versions/anaconda3-5.3.1/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 121, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/nouki/.pyenv/versions/anaconda3-5.3.1/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/nouki/github/ESPNetv2/segmentation/cnn/SegmentationModel.py", line 62, in forward
merge_l2 = self.project_l2(torch.cat([out_l2, out_up_l3], 1))
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 136 and 135 in dimension 2 at /opt/conda/conda-bld/pytorch_1532581333611/work/aten/src/THC/generic/THCTensorMath.cu:87

I have not had time to learn pytorch, so I did not know what this error means by examining.

・Code mistake?
Perhaps the code of this part is wrong
loadData.py line 41
label_img [label_img = 255] = 19
↓
label_img [label_img == 255] = 19

DataSet.py
label [label = 255] = 19
↓
label [label == 255] = 19

Thank you!

Sachin Mehta · Answer 1 · Thu Dec 20 2018 20:41:07 GMT+0800 (China Standard Time)

If you have 11 classes including background, then the mapping is from 0 to 10 because Python index starts from 0. I think your fix works, but make sure that you are mapping 255 pixels correctly.
Your input dimensions should be divisible by 16. Please ensure that you are feeding images with correct sizes.
Thanks for noting the typo. I have corrected it.

keishiro · Answer 2 · Fri Dec 21 2018 11:00:24 GMT+0800 (China Standard Time)

Thank you!
I will check the image size is correct.

keishiro · Answer 3 · Fri Dec 21 2018 17:28:14 GMT+0800 (China Standard Time)

I tried with height 768, width 432 (divisible by 16).
However, when this command is executed, a similar error occurs.
CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --inWidth 768 --inHeight 432 --data_dir ./original_dataset --classes 12 --batch_size 5 --s 1.0

Next, learning started when I used the default value (512, 1024) without specifying height and width. At this time, the image size of my data set is still 768, 432.
CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --data_dir ./original_dataset --classes 12 --batch_size 5 --s 1.0

What is going on?

Sachin Mehta · Answer 4 · Fri Dec 21 2018 23:34:53 GMT+0800 (China Standard Time)

If you look at the main file, we have data loaders at different scales i.e. why you are seeing size. Try to use change the scale parameters in data loaders.