SHI-Labs / Neighborhood-Attention-Transformer

Neighborhood Attention Transformer, arxiv 2022 / CVPR 2023. Dilated Neighborhood Attention Transformer, arxiv 2022

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

mmdetection on COCO2017 not converge

jamesben6688 opened this issue · comments

commented

Hi Ali, I tried your code on COCO2017 using mmdetection, but the training does not converge. I tried both cascade_mask_rcnn and mask_rcnn, but neither of them converges.

My environment:

python3.8
pytorch: 1.11.0+cu113
mmcv-full: 1.4.8
mmdet: 2.19.0

COCO directory:

+ annotations  
    + captions_train2017.json  
    + instances_train2017.json  
    + person_keypoints_train2017.json
    + captions_val2017.json    
    + instances_val2017.json   
    + person_keypoints_val2017.json
+ train2017  
+val2017

All raw files were downloaded from the COCO official website. The loss remains at a high level, and all the average precision (AP) values are zero.

Following is the first couple to iters:

2023-11-09 23:15:30,812 - mmcv - INFO - Reducer buckets have been rebuilt in this iteration.
2023-11-09 23:15:44,534 - mmdet - INFO - Epoch [1][50/7330]     lr: 9.890e-06, eta: 4 days, 10:26:14, time: 1.452, data_time: 1.204, memory: 10086, loss_rpn_cls: 0.5645, loss_rpn_bbox: 0.2489, loss_cls: 4.5210, acc: 0.0808, loss_bbox: 0.0397, loss_mask: 0.7171, loss: 6.0913
2023-11-09 23:15:57,922 - mmdet - INFO - Epoch [1][100/7330]    lr: 1.988e-05, eta: 2 days, 15:01:05, time: 0.268, data_time: 0.029, memory: 10086, loss_rpn_cls: 0.5604, loss_rpn_bbox: 0.2420, loss_cls: 4.5210, acc: 0.0857, loss_bbox: 0.0413, loss_mask: 0.7138, loss: 6.0784
2023-11-09 23:16:11,390 - mmdet - INFO - Epoch [1][150/7330]    lr: 2.987e-05, eta: 2 days, 0:34:55, time: 0.269, data_time: 0.030, memory: 10086, loss_rpn_cls: 0.5621, loss_rpn_bbox: 0.2455, loss_cls: 4.5187, acc: 0.0850, loss_bbox: 0.0388, loss_mask: 0.7166, loss: 6.0818
2023-11-09 23:16:24,826 - mmdet - INFO - Epoch [1][200/7330]    lr: 3.986e-05, eta: 1 day, 17:20:59, time: 0.269, data_time: 0.028, memory: 10086, loss_rpn_cls: 0.5604, loss_rpn_bbox: 0.2423, loss_cls: 4.5181, acc: 0.0747, loss_bbox: 0.0390, loss_mask: 0.7185, loss: 6.0783
2023-11-09 23:16:38,230 - mmdet - INFO - Epoch [1][250/7330]    lr: 4.985e-05, eta: 1 day, 12:59:59, time: 0.268, data_time: 0.029, memory: 10086, loss_rpn_cls: 0.5616, loss_rpn_bbox: 0.2448, loss_cls: 4.5196, acc: 0.0811, loss_bbox: 0.0388, loss_mask: 0.7180, loss: 6.0826
2023-11-09 23:16:51,530 - mmdet - INFO - Epoch [1][300/7330]    lr: 5.984e-05, eta: 1 day, 10:04:23, time: 0.266, data_time: 0.029, memory: 10086, loss_rpn_cls: 0.5652, loss_rpn_bbox: 0.2486, loss_cls: 4.5217, acc: 0.0913, loss_bbox: 0.0417, loss_mask: 0.7145, loss: 6.0917
2023-11-09 23:17:05,057 - mmdet - INFO - Epoch [1][350/7330]    lr: 6.983e-05, eta: 1 day, 8:01:45, time: 0.271, data_time: 0.030, memory: 10086, loss_rpn_cls: 0.5610, loss_rpn_bbox: 0.2403, loss_cls: 4.5226, acc: 0.0725, loss_bbox: 0.0397, loss_mask: 0.7176, loss: 6.0812
2023-11-09 23:17:18,510 - mmdet - INFO - Epoch [1][400/7330]    lr: 7.982e-05, eta: 1 day, 6:28:54, time: 0.269, data_time: 0.029, memory: 10086, loss_rpn_cls: 0.5635, loss_rpn_bbox: 0.2402, loss_cls: 4.5204, acc: 0.0942, loss_bbox: 0.0403, loss_mask: 0.7168, loss: 6.0812
2023-11-09 23:17:32,048 - mmdet - INFO - Epoch [1][450/7330]    lr: 8.981e-05, eta: 1 day, 5:17:28, time: 0.271, data_time: 0.032, memory: 10086, loss_rpn_cls: 0.5633, loss_rpn_bbox: 0.2524, loss_cls: 4.5189, acc: 0.0957, loss_bbox: 0.0401, loss_mask: 0.7142, loss: 6.0889
2023-11-09 23:17:45,381 - mmdet - INFO - Epoch [1][500/7330]    lr: 9.980e-05, eta: 1 day, 4:18:28, time: 0.267, data_time: 0.029, memory: 10086, loss_rpn_cls: 0.5636, loss_rpn_bbox: 0.2419, loss_cls: 4.5199, acc: 0.0798, loss_bbox: 0.0374, loss_mask: 0.7165, loss: 6.0792
2023-11-09 23:17:58,847 - mmdet - INFO - Epoch [1][550/7330]    lr: 1.000e-04, eta: 1 day, 3:31:13, time: 0.269, data_time: 0.031, memory: 10086, loss_rpn_cls: 0.5648, loss_rpn_bbox: 0.2537, loss_cls: 4.5214, acc: 0.0837, loss_bbox: 0.0398, loss_mask: 0.7180, loss: 6.0977
2023-11-09 23:18:12,282 - mmdet - INFO - Epoch [1][600/7330]    lr: 1.000e-04, eta: 1 day, 2:51:35, time: 0.269, data_time: 0.026, memory: 10086, loss_rpn_cls: 0.5626, loss_rpn_bbox: 0.2425, loss_cls: 4.5211, acc: 0.0916, loss_bbox: 0.0390, loss_mask: 0.7159, loss: 6.0809
2023-11-09 23:18:26,248 - mmdet - INFO - Epoch [1][650/7330]    lr: 1.000e-04, eta: 1 day, 2:21:36, time: 0.279, data_time: 0.031, memory: 10086, loss_rpn_cls: 0.5682, loss_rpn_bbox: 0.2603, loss_cls: 4.5202, acc: 0.0918, loss_bbox: 0.0398, loss_mask: 0.7182, loss: 6.1067
2023-11-09 23:18:39,892 - mmdet - INFO - Epoch [1][700/7330]    lr: 1.000e-04, eta: 1 day, 1:53:50, time: 0.273, data_time: 0.026, memory: 10086, loss_rpn_cls: 0.5629, loss_rpn_bbox: 0.2412, loss_cls: 4.5175, acc: 0.0818, loss_bbox: 0.0388, loss_mask: 0.7152, loss: 6.0755
2023-11-09 23:18:53,903 - mmdet - INFO - Epoch [1][750/7330]    lr: 1.000e-04, eta: 1 day, 1:31:54, time: 0.280, data_time: 0.030, memory: 10086, loss_rpn_cls: 0.5668, loss_rpn_bbox: 0.2636, loss_cls: 4.5208, acc: 0.0779, loss_bbox: 0.0391, loss_mask: 0.7168, loss: 6.1071
2023-11-09 23:19:07,780 - mmdet - INFO - Epoch [1][800/7330]    lr: 1.000e-04, eta: 1 day, 1:11:56, time: 0.278, data_time: 0.030, memory: 10086, loss_rpn_cls: 0.5662, loss_rpn_bbox: 0.2581, loss_cls: 4.5178, acc: 0.0918, loss_bbox: 0.0401, loss_mask: 0.7177, loss: 6.0998
2023-11-09 23:19:21,543 - mmdet - INFO - Epoch [1][850/7330]    lr: 1.000e-04, eta: 1 day, 0:53:43, time: 0.275, data_time: 0.031, memory: 10086, loss_rpn_cls: 0.5604, loss_rpn_bbox: 0.2394, loss_cls: 4.5216, acc: 0.0830, loss_bbox: 0.0397, loss_mask: 0.7149, loss: 6.0760
2023-11-09 23:19:35,609 - mmdet - INFO - Epoch [1][900/7330]    lr: 1.000e-04, eta: 1 day, 0:38:57, time: 0.281, data_time: 0.028, memory: 10086, loss_rpn_cls: 0.5678, loss_rpn_bbox: 0.2609, loss_cls: 4.5206, acc: 0.0815, loss_bbox: 0.0400, loss_mask: 0.7168, loss: 6.1061

This looks abnormal. Did you perform any preprocessing on the COCO dataset?

commented

This seems to be due to different GPU card, cuda and pytorch version.