RuntimeError: The expanded size of the tensor must match the existing size at non-singleton dimension
david-littlefield opened this issue · comments
Hello @li-js ,
I've been troubleshooting this error for a while now, but I'm unsure what is causing it. The Error occurs after iterating through a dozen or so images. Eventually, one of the tensors is only off by 1, and the program crashes. At first, I thought it was a rounding error caused by the outS
function in the model_solver
file. But, experimenting with it didn't fix the error. Recently, I noticed the tensors are different sizes on line 267 and 271
in the train.py
file. But, I still don't know what is causing the problem. Also, I'm running on a cpu, so testing takes a while, but I'll have access to a gpu within the next week so.
Training Epoch 0/18 (3199.0 batches) ..: 0%| | 11/600Traceback (most recent call last):
File "train.py", line 272, in <module>
loss_tag_grp = myInstanceLoss.myInstanceLoss_group(tag, tag_target, ignore_index=None)
File "libs/myInstanceLoss.py", line 118, in myInstanceLoss_group
inst = pred[mask.expand_as(pred)].view(C,-1,1) #c x -1 x 1
RuntimeError: The expanded size of the tensor (76) must match the existing size (77) at non-singleton dimension 1. Target sizes: [8, 76, 76]. Tensor sizes: [1, 77, 76]
I think the most probable cause is the difference in the versions of the library used. I have specified the version of the key libs in eval.py, e.g.: https://github.com/li-js/MHPM/blob/master/eval.py#L3
Otherwise, I guess it would not be a big problem if the tag_target's dimension is adjusted a little bit (by trimming) to fit the dimension of the tag.
Thanks, @li-js. I tried your suggestion, but none of the versions listed in eval.py
could be installed. Either, it wasn't offered by pip, it crashed during installation, or it conflicted with another module.
Is trimming like cropping or slicing an image? Or, would I need to convert the tensor to an image, crop it to fit, and then convert it back to a tensor?
The project was done some time ago and the libraries do change a lot over time.
Trimming is like cropping an image. I guess it can be done directly on the tensor, something like tensor2=tensor[:,0:76,:,]
Thank, @li-js. You're really helping me out. I've taken dozens of machine learning/computer vision courses, but it still feels quite intimidating and overwhelming.
I was able to run the eval.py
file - it created a pkl
file and thousands of pkz
files.
I attempted to use the pre-trained model to make predictions on a new image, but it didn't turn out good. I've never implemented a repository without detailed instructions, so I don't know if the results are a reflection of my error or the model. [Code]
refine_global = False:
refine_global = True:
I also made predictions on an image from the training set, but the results were similar.
How do you get results like the ones in the paper?
I spotted several issues in your codes.
It seems some weights are not properly loaded. Please use model.load_state_dict(checkpoint, strict = True) to make sure every layer's weights are loaded.
model.eval() should be used, as the behaviors of eval and train are different.
The pretrained model are for weight initialization in model training, please use "Models for deployment" to see the predictions.
@li-js I've learned a bit more since my last message.
- I've downloaded the images.
- Created a val.txt file using the train.txt file
- Run the global segment and tag files
But, the pidx
from the get_det_mask
function seems to always be 0
, so the detections list is empty. This causes an index error in the eval_seg_ap
function.
It seems like pidx
is detemined by the instance_map_torch
, which is determined by the clustering
function.
Would you happen to know why this is happening?
Please make sure the input image and the model weights are properly loaded. The image can be visualized to double-check, the model weights can be checked by comparing with the weights in the trained model.
@li-js The images seem to load from the data loader. But, I don't know how to compare the weights.
The models appeared to load because each model printed. But, each print output was different.
I ran the eval.py
file using the two models in the models folder, but the same error occured.
As instructed, I updated the following code:
saved_state_dict = torch.load(args.trained_model)
model.load_state_dict(saved_state_dict, strict = True)
I used the following code to check the weights:
# PRINT WEIGHTS
for parameter in model.parameters():
print(parameter.data)
I used the following code to check the image:
# SHOW IMAGE ONCE
first_time = True
for i_iter, batch in enumerate(testloader):
# VERIFY THE IMAGE LOADS
if first_time:
image = batch["image"][0].permute(1, 2, 0).numpy()
cv2.imshow("IMAGE", image)
key = cv2.waitKey(0)
if key == ord("q"):
break
cv2.destroyAllWindows()
first_time = False
...
But, the following error occurs:
Traceback (most recent call last): | 0/400 [00:00<?, ?it/s]
File "eval.py", line 264, in <module>
ap_, pcp_ = eval_metrics_v2.eval_seg_ap(results_all, dat_list, ovthresh_seg=thr, From_pkl=False)
File "libs/evaluate/eval_metrics_v2.py", line 66, in eval_seg_ap
BB = BB[sorted_ind, :]
IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed
Please check out https://github.com/li-js/MHPM/blob/master/forward_single.py for an example of predicting a single image.
@li-js This looks awesome, thank you so much for doing that! Will try it out after I finish this other project! 😄
@li-js Thank you again! It works using the newest versions of the requirements on MacOS, Windows 10, and Ubuntu! 🥳🎉
The only change that was needed was on line 78.
from pylab import plt
import matplotlib.pyplot as plt