li-js / MHPM

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

RuntimeError: The expanded size of the tensor must match the existing size at non-singleton dimension

david-littlefield opened this issue · comments

Hello @li-js ,

I've been troubleshooting this error for a while now, but I'm unsure what is causing it. The Error occurs after iterating through a dozen or so images. Eventually, one of the tensors is only off by 1, and the program crashes. At first, I thought it was a rounding error caused by the outS function in the model_solver file. But, experimenting with it didn't fix the error. Recently, I noticed the tensors are different sizes on line 267 and 271 in the train.py file. But, I still don't know what is causing the problem. Also, I'm running on a cpu, so testing takes a while, but I'll have access to a gpu within the next week so.

Training Epoch 0/18 (3199.0 batches) ..:   0%| | 11/600Traceback (most recent call last):
  File "train.py", line 272, in <module>
    loss_tag_grp = myInstanceLoss.myInstanceLoss_group(tag, tag_target, ignore_index=None)
  File "libs/myInstanceLoss.py", line 118, in myInstanceLoss_group
    inst = pred[mask.expand_as(pred)].view(C,-1,1) #c x -1 x 1
RuntimeError: The expanded size of the tensor (76) must match the existing size (77) at non-singleton dimension 1.  Target sizes: [8, 76, 76].  Tensor sizes: [1, 77, 76]

I think the most probable cause is the difference in the versions of the library used. I have specified the version of the key libs in eval.py, e.g.: https://github.com/li-js/MHPM/blob/master/eval.py#L3

Otherwise, I guess it would not be a big problem if the tag_target's dimension is adjusted a little bit (by trimming) to fit the dimension of the tag.

Thanks, @li-js. I tried your suggestion, but none of the versions listed in eval.py could be installed. Either, it wasn't offered by pip, it crashed during installation, or it conflicted with another module.

Is trimming like cropping or slicing an image? Or, would I need to convert the tensor to an image, crop it to fit, and then convert it back to a tensor?

The project was done some time ago and the libraries do change a lot over time.

Trimming is like cropping an image. I guess it can be done directly on the tensor, something like tensor2=tensor[:,0:76,:,]

Thank, @li-js. You're really helping me out. I've taken dozens of machine learning/computer vision courses, but it still feels quite intimidating and overwhelming.

I was able to run the eval.py file - it created a pkl file and thousands of pkz files.

I attempted to use the pre-trained model to make predictions on a new image, but it didn't turn out good. I've never implemented a repository without detailed instructions, so I don't know if the results are a reflection of my error or the model. [Code]

refine_global = False:

Screen Shot 2020-07-20 at 12 14 53 PM

refine_global = True:

Screen Shot 2020-07-20 at 1 43 24 PM

I also made predictions on an image from the training set, but the results were similar.

How do you get results like the ones in the paper?

Screen Shot 2020-07-20 at 11 58 55 AM

I spotted several issues in your codes.

It seems some weights are not properly loaded. Please use model.load_state_dict(checkpoint, strict = True) to make sure every layer's weights are loaded.
model.eval() should be used, as the behaviors of eval and train are different.
The pretrained model are for weight initialization in model training, please use "Models for deployment" to see the predictions.

@li-js I've learned a bit more since my last message.

  1. I've downloaded the images.
  2. Created a val.txt file using the train.txt file
  3. Run the global segment and tag files

But, the pidx from the get_det_mask function seems to always be 0, so the detections list is empty. This causes an index error in the eval_seg_ap function.

It seems like pidx is detemined by the instance_map_torch, which is determined by the clustering function.

Would you happen to know why this is happening?

Please make sure the input image and the model weights are properly loaded. The image can be visualized to double-check, the model weights can be checked by comparing with the weights in the trained model.

@li-js The images seem to load from the data loader. But, I don't know how to compare the weights.

The models appeared to load because each model printed. But, each print output was different.

I ran the eval.py file using the two models in the models folder, but the same error occured.

As instructed, I updated the following code:

saved_state_dict = torch.load(args.trained_model)
model.load_state_dict(saved_state_dict, strict = True)

I used the following code to check the weights:

# PRINT WEIGHTS
for parameter in model.parameters():
    print(parameter.data)

I used the following code to check the image:

# SHOW IMAGE ONCE
first_time = True

for i_iter, batch in enumerate(testloader):

    # VERIFY THE IMAGE LOADS
    if first_time:
        image = batch["image"][0].permute(1, 2, 0).numpy()
        cv2.imshow("IMAGE", image)
        key = cv2.waitKey(0)
        if key == ord("q"):
            break        
        cv2.destroyAllWindows()
        first_time = False
    ...

But, the following error occurs:

Traceback (most recent call last):                      | 0/400 [00:00<?, ?it/s]
  File "eval.py", line 264, in <module>
    ap_, pcp_ = eval_metrics_v2.eval_seg_ap(results_all, dat_list, ovthresh_seg=thr, From_pkl=False)
  File "libs/evaluate/eval_metrics_v2.py", line 66, in eval_seg_ap
    BB = BB[sorted_ind, :]
IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

Please check out https://github.com/li-js/MHPM/blob/master/forward_single.py for an example of predicting a single image.

@li-js This looks awesome, thank you so much for doing that! Will try it out after I finish this other project! 😄

@li-js Thank you again! It works using the newest versions of the requirements on MacOS, Windows 10, and Ubuntu! 🥳🎉

The only change that was needed was on line 78.

from pylab import plt
import matplotlib.pyplot as plt