I use pytorch1.2 train the code, there are some errers in yolo_layer.py

Question

I use pytorch1.2 train the code, there are some errers in yolo_layer.py

xuezhongcailian opened this issue 4 years ago · comments

xuezhongcailian commented 4 years ago

hi, this can use pytorch1.2 to train?

xuezhongcailian · Answer 1 · Fri Dec 13 2019 11:04:06 GMT+0800 (China Standard Time)

one of the variables needed for gradient computation has been modified by an inplace operation

Motoki Kimura · Answer 2 · Tue Dec 17 2019 20:58:41 GMT+0800 (China Standard Time)

Hi, I have never tried torch 1.2 in this repo.
Can you try with torch 1.0.0 as written in requirements.txt?

Or maybe you can avoid that error by replacing in-place operations in yolo_layer.py.

Jiali MA · Answer 3 · Thu Dec 19 2019 18:04:28 GMT+0800 (China Standard Time)

May I ask which line is the inplace operation that need to be modified? When I train the model on my own data, error occurs "IndexError: index 76 is out of bounds for dimension 3 with size 76"

Motoki Kimura · Answer 4 · Thu Dec 19 2019 23:29:38 GMT+0800 (China Standard Time)

@milliema your error seems to be different from the one caused by in-place operations. Could you show me whole of the error messages? I cannot say anything for sure otherwise.

Jiali MA · Answer 5 · Fri Dec 20 2019 11:35:07 GMT+0800 (China Standard Time)

@milliema your error seems to be different from the one caused by in-place operations. Could you show me whole of the error messages? I cannot say anything for sure otherwise.
Thanks for your quick reply! I've modified the code a little bit to be used on my own datasets, the modifications include:

change the N_CLASSES in cfg file;
modify the train/val data directory following coco format;
Then, when I run train.py the 1st error occurs as below:
Traceback (most recent call last):
File "train_am.py", line 237, in
main()
File "train_am.py", line 185, in main
loss = model(imgs, targets)
File "/home/ubuntu/miniconda3/envs/autom/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/media/ubuntu/HDD/project/PyTorch_Gaussian_YOLOv3/models/yolov3.py", line 154, in forward
x, *loss_dict = module(x, targets)
File "/home/ubuntu/miniconda3/envs/autom/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/media/ubuntu/HDD/project/PyTorch_Gaussian_YOLOv3/models/yolo_layer.py", line 188, in forward
obj_mask[b] = 1-pred_best_iou
File "/home/ubuntu/miniconda3/envs/autom/lib/python3.6/site-packages/torch/tensor.py", line 325, in rsub
return _C._VariableFunctions.rsub(self, other)
RuntimeError: Subtraction, the - operator, with a bool tensor is not supported. If you are trying to invert a mask, use the ~ or bitwise_not() operator instead.

Then I change the code "obj_mask[b] = 1-pred_best_iou" in yolo_layer.py to "obj_mask[b] = ~pred_best_iou", the 2nd error occurs as below:
Setting Arguments.. : Namespace(cfg='config/automotive_default.cfg', checkpoint=None, checkpoint_dir='checkpoints', checkpoint_interval=1000, debug=False, eval_interval=4000, n_cpu=0, tfboard_dir=None, use_cuda=True, weights_path='/media/ubuntu/HDD/project/PyTorch_Gaussian_YOLOv3/gaussian_yolov3_coco.pth')
train_am.py:57: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
cfg = yaml.load(f)
successfully loaded config file: {'MODEL': {'TYPE': 'YOLOv3', 'BACKBONE': 'darknet53', 'ANCHORS': [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]], 'ANCH_MASK': [[6, 7, 8], [3, 4, 5], [0, 1, 2]], 'N_CLASSES': 2, 'GAUSSIAN': True}, 'TRAIN': {'LR': 0.001, 'MOMENTUM': 0.9, 'DECAY': 0.0005, 'BURN_IN': 1000, 'MAXITER': 500000, 'STEPS': '(400000, 450000)', 'BATCHSIZE': 4, 'SUBDIVISION': 16, 'IMGSIZE': 608, 'LOSSTYPE': 'l2', 'IGNORETHRE': 0.7, 'GRADIENT_CLIP': 2000.0}, 'AUGMENTATION': {'RANDRESIZE': True, 'JITTER': 0.3, 'RANDOM_PLACING': True, 'HUE': 0.1, 'SATURATION': 1.5, 'EXPOSURE': 1.5, 'LRFLIP': True, 'RANDOM_DISTORT': True}, 'TEST': {'CONFTHRE': 0.8, 'NMSTHRE': 0.45, 'IMGSIZE': 416}, 'NUM_GPUS': 1}
effective_batch_size = batch_size * iter_size = 4 * 16
Gaussian YOLOv3
Gaussian YOLOv3
Gaussian YOLOv3
loading darknet weights.... /media/ubuntu/HDD/project/PyTorch_Gaussian_YOLOv3/gaussian_yolov3_coco.pth
using cuda
loading annotations into memory...
Done (t=0.05s)
creating index...
index created!
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
evaluating...
obj_mask torch.Size([4, 3, 76, 76])
0 tensor(0) 0 0
obj_mask torch.Size([4, 3, 76, 76])
1 tensor(0) 0 76
Traceback (most recent call last):
File "train_am.py", line 237, in
main()
File "train_am.py", line 185, in main
loss = model(imgs, targets)
File "/home/ubuntu/miniconda3/envs/autom/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/media/ubuntu/HDD/project/PyTorch_Gaussian_YOLOv3/models/yolov3.py", line 154, in forward
x, *loss_dict = module(x, targets)
File "/home/ubuntu/miniconda3/envs/autom/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/media/ubuntu/HDD/project/PyTorch_Gaussian_YOLOv3/models/yolo_layer.py", line 200, in forward
obj_mask[b, a, j, i] = 1
IndexError: index 76 is out of bounds for dimension 3 with size 76

Do you have any ides about the error? Thanks for your help.

Motoki Kimura · Answer 6 · Fri Dec 20 2019 22:29:51 GMT+0800 (China Standard Time)

IndexError: index 76 is out of bounds for dimension 3 with size 76
The 3rd dimension represents x-index on the feature map.
In your dataset, some of the boxes might be located (partially) outside of the image.
Is it possible to clip those boxes to the image in your dataset?