RuntimeError: The expanded size of the tensor must match the existing size at non-singleton dimension

Question

RuntimeError: The expanded size of the tensor must match the existing size at non-singleton dimension

david-littlefield opened this issue 4 years ago · comments

I've been troubleshooting this error for a while now, but I'm unsure what is causing it. The Error occurs after iterating through a dozen or so images. Eventually, one of the tensors is only off by 1, and the program crashes. At first, I thought it was a rounding error caused by the outS function in the model_solver file. But, experimenting with it didn't fix the error. Recently, I noticed the tensors are different sizes on line 267 and 271 in the train.py file. But, I still don't know what is causing the problem. Also, I'm running on a cpu, so testing takes a while, but I'll have access to a gpu within the next week so.

Training Epoch 0/18 (3199.0 batches) ..:   0%| | 11/600Traceback (most recent call last):
  File "train.py", line 272, in <module>
    loss_tag_grp = myInstanceLoss.myInstanceLoss_group(tag, tag_target, ignore_index=None)
  File "libs/myInstanceLoss.py", line 118, in myInstanceLoss_group
    inst = pred[mask.expand_as(pred)].view(C,-1,1) #c x -1 x 1
RuntimeError: The expanded size of the tensor (76) must match the existing size (77) at non-singleton dimension 1.  Target sizes: [8, 76, 76].  Tensor sizes: [1, 77, 76]

Li Jianshu · Answer 1 · Sun Jul 19 2020 11:25:31 GMT+0800 (China Standard Time)

I think the most probable cause is the difference in the versions of the library used. I have specified the version of the key libs in eval.py, e.g.: https://github.com/li-js/MHPM/blob/master/eval.py#L3

Otherwise, I guess it would not be a big problem if the tag_target's dimension is adjusted a little bit (by trimming) to fit the dimension of the tag.

david-littlefield · Answer 2 · Sun Jul 19 2020 15:01:53 GMT+0800 (China Standard Time)

Thanks, @li-js. I tried your suggestion, but none of the versions listed in eval.py could be installed. Either, it wasn't offered by pip, it crashed during installation, or it conflicted with another module.

Is trimming like cropping or slicing an image? Or, would I need to convert the tensor to an image, crop it to fit, and then convert it back to a tensor?

Li Jianshu · Answer 3 · Sun Jul 19 2020 21:00:20 GMT+0800 (China Standard Time)

The project was done some time ago and the libraries do change a lot over time.

Trimming is like cropping an image. I guess it can be done directly on the tensor, something like tensor2=tensor[:,0:76,:,]

david-littlefield · Answer 4 · Tue Jul 21 2020 03:26:59 GMT+0800 (China Standard Time)

Thank, @li-js. You're really helping me out. I've taken dozens of machine learning/computer vision courses, but it still feels quite intimidating and overwhelming.

I was able to run the eval.py file - it created a pkl file and thousands of pkz files.

I attempted to use the pre-trained model to make predictions on a new image, but it didn't turn out good. I've never implemented a repository without detailed instructions, so I don't know if the results are a reflection of my error or the model. [Code]

refine_global = False:

refine_global = True:

I also made predictions on an image from the training set, but the results were similar.

How do you get results like the ones in the paper?

Li Jianshu · Answer 5 · Tue Jul 21 2020 23:15:51 GMT+0800 (China Standard Time)

I spotted several issues in your codes.

It seems some weights are not properly loaded. Please use model.load_state_dict(checkpoint, strict = True) to make sure every layer's weights are loaded.
model.eval() should be used, as the behaviors of eval and train are different.
The pretrained model are for weight initialization in model training, please use "Models for deployment" to see the predictions.

david-littlefield · Answer 6 · Tue Aug 11 2020 06:21:11 GMT+0800 (China Standard Time)

@li-js I've learned a bit more since my last message.

I've downloaded the images.
Created a val.txt file using the train.txt file
Run the global segment and tag files

But, the pidx from the get_det_mask function seems to always be 0, so the detections list is empty. This causes an index error in the eval_seg_ap function.

It seems like pidx is detemined by the instance_map_torch, which is determined by the clustering function.

Would you happen to know why this is happening?

Li Jianshu · Answer 7 · Tue Aug 11 2020 13:26:07 GMT+0800 (China Standard Time)

Please make sure the input image and the model weights are properly loaded. The image can be visualized to double-check, the model weights can be checked by comparing with the weights in the trained model.

david-littlefield · Answer 8 · Wed Aug 12 2020 15:50:28 GMT+0800 (China Standard Time)

@li-js The images seem to load from the data loader. But, I don't know how to compare the weights.

The models appeared to load because each model printed. But, each print output was different.

I ran the eval.py file using the two models in the models folder, but the same error occured.

As instructed, I updated the following code:

saved_state_dict = torch.load(args.trained_model)
model.load_state_dict(saved_state_dict, strict = True)

I used the following code to check the weights:

# PRINT WEIGHTS
for parameter in model.parameters():
    print(parameter.data)

I used the following code to check the image:

# SHOW IMAGE ONCE
first_time = True

for i_iter, batch in enumerate(testloader):

    # VERIFY THE IMAGE LOADS
    if first_time:
        image = batch["image"][0].permute(1, 2, 0).numpy()
        cv2.imshow("IMAGE", image)
        key = cv2.waitKey(0)
        if key == ord("q"):
            break        
        cv2.destroyAllWindows()
        first_time = False
    ...

But, the following error occurs:

Traceback (most recent call last):                      | 0/400 [00:00<?, ?it/s]
  File "eval.py", line 264, in <module>
    ap_, pcp_ = eval_metrics_v2.eval_seg_ap(results_all, dat_list, ovthresh_seg=thr, From_pkl=False)
  File "libs/evaluate/eval_metrics_v2.py", line 66, in eval_seg_ap
    BB = BB[sorted_ind, :]
IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

Li Jianshu · Answer 9 · Sun Aug 16 2020 17:44:45 GMT+0800 (China Standard Time)

Please check out https://github.com/li-js/MHPM/blob/master/forward_single.py for an example of predicting a single image.

david-littlefield · Answer 10 · Sun Aug 23 2020 03:14:53 GMT+0800 (China Standard Time)

@li-js This looks awesome, thank you so much for doing that! Will try it out after I finish this other project! 😄

david-littlefield · Answer 11 · Tue Aug 25 2020 23:40:26 GMT+0800 (China Standard Time)

@li-js Thank you again! It works using the newest versions of the requirements on MacOS, Windows 10, and Ubuntu! 🥳🎉

The only change that was needed was on line 78.

from pylab import plt

import matplotlib.pyplot as plt