Working with scale: 2nd place solution to Product Detection in Densely Packed Scenes
This repository contains code for the 2nd place solution of the detection challenge which is held within CVPR 2020 Retail-Vision workshop.
For more information see my report . For all the experiments MMDetection v1 was used.
The dataset has been originally announced by Eran Goldman et. al .
In order to obtain the dataset for research purpose, please concat the authors.
For evaluation purpose please clone pycocotools, change the parameter maxDets
to 300 here and then install locally.
1. Convert SKU110k csv format to COCO-like json
python sku110k_scripts/sku110k_to_coco.py --args
2. Convert a full frame COCO-like dataset to a tiled one
python sku110k_scripts/split_on_tiles.py --args
./tools/dist_train configs/sku110k/sku110k_faster_rcnn_r50_fpn_anchor_1x_4tiles_test_half_res.py 2
./tools/dist_test configs/sku110k/sku110k_faster_rcnn_r50_fpn_anchor_1x_4tiles_test_half_res.py workdir/faster_rcnn_r50_fpn_anchor_1x_4tiles/latest.pth 2 --eval bbox
5. Create a dummy json file for the leaderboard-test
python sku110k_scripts/lb_test_to_coco.py --args
6. Inferencing with mmdet
./tools/dist_test configs/sku110k/sku110k_faster_rcnn_r50_fpn_anchor_1x_4tiles_test_half_res.py workdir/faster_rcnn_r50_fpn_anchor_1x_4tiles/latest.pth 2 --format_only --options "jsonfile_prefix=./submit"
7. Convert json output back to SKU110k csv format
python sku110k_scripts/json_out_to_submit.py --args
Config
Backbone
Lr schd
Base lr
imgs_p_gpu
img_scale
anchor_sc
mAP
AP@0.5
AP@0.75
AR
Tr.mAP
Tr.AP@0.5
Tr.AP@0.75
Tr.AR
RetinaNet-r50-fpn
r50
1x
0.001
2
(1333, 800)
4 (octave)
0.463
0.751
0.532
0.512
0.467
0.752
0.535
0.516
Faster-RCNN-r50-fpn
r50
1x
0.005
2
(1333, 800)
[8]
0.523
0.850
0.592
0.582
0.537
0.862
0.612
0.594
Config
Backbone
Lr schd
Base lr
imgs_p_gpu
img_scale
anchor_sc
4tiles
mAP
AP@0.5
AP@0.75
AR
Tr.mAP
Tr.AP@0.5
Tr.AP@0.75
Tr.AR
GA-RetinaNet-r50-fpn
r50
1x
0.001
2
(816, 1088)
4 (octave)
☐
0.523
0.870
0.579
0.583
0.532
0.881
0.590
0.591
GA-RetinaNet-x101-32x4d-fpn
x101-32x4d
1x
0.001
2
(816, 1088)
4 (octave)
☐
0.537
0.882
0.602
0.598
0.552
0.896
0.623
0.610
RepPoints-moment-r50-fpn
r50
1x
0.02
6
(816, 1088)
4 (base)
☐
0.505
0.815
0.578
0.562
0.519
0.820
0.601
0.574
3. Comparison of different anchor scales for Faster-RCNN
Config
Backbone
Lr schd
Base lr
imgs_p_gpu
img_scale
anchor_sc
mAP
AP@0.5
AP@0.75
AR
Tr.mAP
Tr.AP@0.5
Tr.AP@0.75
Tr.AR
Faster-RCNN-r50-fpn
r50
1x
0.005
2
(816, 1088)
[8]
0.522
0.850
0.591
0.577
0.534
0.862
0.611
0.590
Faster-RCNN-r50-fpn
r50
1x
0.005
2
(816, 1088)
[4]
0.551
0.912
0.614
0.613
0.567
0.926
0.636
0.629
Faster-RCNN-r50-fpn
r50
1x
0.005
2
(816, 1088)
[3]
0.549
0.911
0.611
0.614
4. Comparison of different anchor scales for RetinaNet
Config
Backbone
Lr schd
Base lr
imgs_p_gpu
img_scale
anchor_sc
mAP
AP@0.5
AP@0.75
AR
Tr.mAP
Tr.AP@0.5
Tr.AP@0.75
Tr.AR
RetinaNet-r50-fpn
r50
1x
0.001
2
(1333, 800)
4 (octave)
0.463
0.751
0.532
0.512
0.467
0.752
0.535
0.516
RetinaNet-r50-fpn
r50
1x
0.001
2
(1333, 800)
3 (octave)
0.508
0.849
0.564
0.569
0.513
0.853
0.574
0.574
5. Bells and whistles testing
Config
Backbone
Lr schd
Base lr
imgs_p_gpu
img_scale
anchor_sc
4tiles
s-nms test
extra augs
traintime flip
testtime flip
mAP
AP@0.5
AP@0.75
AR
Faster-RCNN-r50-fpn
r50
1x
0.005
2
(752, 1024), (816, 1088), (880, 1152)
[4]
☐
☐
☐
✓
☐
0.552
0.912
0.615
0.616
Faster-RCNN-r50-fpn
r50
1x
0.005
2
(816, 1088)
[4]
☐
☐
✓
☐
☐
0.548
0.911
0.608
0.612
Faster-RCNN-r50-fpn
r50
2x
0.005
2
(816, 1088)
[4]
☐
☐
✓
✓
☐
0.540
0.906
0.596
0.606
Faster-RCNN-r50-fpn
r50
2x
0.005
2
(816, 1088)
[4]
☐
☐
✓
✓
✓
0.510
0.888
0.543
0.584
6. Cascade-RCNN comparison
Config
Backbone
Lr schd
Base lr
imgs_p_gpu
img_scale
anchor_sc
4tiles
s-nms test
mAP
AP@0.5
AP@0.75
AR
Tr.mAP
Tr.AP@0.5
Tr.AP@0.75
Tr.AR
Cascade-RCNN-r50-fpn
r50
1x
0.005
2
(816, 1088)
[8]
☐
☐
0.525
0.840
0.604
0.582
0.542
0.862
0.647
0.596
Cascade-RCNN-r50-fpn
r50
1x
0.005
2
(816, 1088)
[4]
☐
☐
0.553
0.902
0.626
0.615
0.574
0.926
0.653
0.634
Cascade-RCNN-r50-fpn
r50
1x
0.005
2
(816, 1088)
[4]
☐
✓
0.556
0.900
0.632
0.622
0.577
0.925
0.659
0.642
Cascade-RCNN-x101-32x4d-fpn
x101-32x4d
1x
0.005
2
(768, 1024)
[4]
☐
☐
0.556
0.903
0.629
0.617
0.583
0.929
0.665
0.640
Cascade-RCNN-x101-32x4d-fpn
x101-32x4d
1x
0.005
2
(768, 1024)
[4]
☐
✓
0.560
0.902
0.635
0.623
0.585
0.929
0.672
0.647
Config
Backbone
Lr schd
Base lr
imgs_p_gpu
img_scale
anchor_sc
4tiles
s-nms test
mAP
AP@0.5
AP@0.75
AR
Faster-RCNN-r50-fpn (w/o merging)
r50
1x
0.005
2
(816, 1088)
[8]
✓
☐
0.561
0.912
0.632
0.628
Faster-RCNN-r50-fpn (w/o merging)
r50
1x
0.005
2
(816, 1088)
[4]
✓
☐
0.566
0.928
0.636
0.636
Faster-RCNN-r50-fpn (merged)
r50
1x
0.005
2
(816, 1088)
[4]
✓
☐
0.547
0.894
0.615
0.611
Faster-RCNN-r50-fpn (full frame)
r50
1x
0.005
2
(816, 1088)
[4]
✓
✓
0.577
0.928
0.659
0.654
Feel free to cite my report if you use any of the results for benchmarking in your work.
@misc{kozlov2020working,
title={Working with scale: 2nd place solution to Product Detection in Densely Packed Scenes [Technical Report]},
author={Artem Kozlov},
year={2020},
eprint={2006.07825},
archivePrefix={arXiv},
primaryClass={cs.CV}
}