Working with scale: 2nd place solution to Product Detection in Densely Packed Scenes

Introduction

This repository contains code for the 2nd place solution of the detection challenge which is held within CVPR 2020 Retail-Vision workshop. For more information see my report. For all the experiments MMDetection v1 was used.

Dataset

The dataset has been originally announced by Eran Goldman et. al. In order to obtain the dataset for research purpose, please concat the authors.

Getting started

For evaluation purpose please clone pycocotools, change the parameter maxDets to 300 here and then install locally.

1. Convert SKU110k csv format to COCO-like json

python sku110k_scripts/sku110k_to_coco.py --args

2. Convert a full frame COCO-like dataset to a tiled one

python sku110k_scripts/split_on_tiles.py --args

3. Training with mmdet

./tools/dist_train configs/sku110k/sku110k_faster_rcnn_r50_fpn_anchor_1x_4tiles_test_half_res.py 2

4. Testing with mmdet

./tools/dist_test configs/sku110k/sku110k_faster_rcnn_r50_fpn_anchor_1x_4tiles_test_half_res.py workdir/faster_rcnn_r50_fpn_anchor_1x_4tiles/latest.pth 2 --eval bbox

5. Create a dummy json file for the leaderboard-test

python sku110k_scripts/lb_test_to_coco.py --args

6. Inferencing with mmdet

./tools/dist_test configs/sku110k/sku110k_faster_rcnn_r50_fpn_anchor_1x_4tiles_test_half_res.py workdir/faster_rcnn_r50_fpn_anchor_1x_4tiles/latest.pth 2 --format_only --options "jsonfile_prefix=./submit"

7. Convert json output back to SKU110k csv format

python sku110k_scripts/json_out_to_submit.py --args

Experiments

1. Initial experiments

Config	Backbone	Lr schd	Base lr	imgs_p_gpu	img_scale	anchor_sc	mAP	AP@0.5	AP@0.75	AR	Tr.mAP	Tr.AP@0.5	Tr.AP@0.75	Tr.AR
RetinaNet-r50-fpn	r50	1x	0.001	2	(1333, 800)	4 (octave)	0.463	0.751	0.532	0.512	0.467	0.752	0.535	0.516
Faster-RCNN-r50-fpn	r50	1x	0.005	2	(1333, 800)	[8]	0.523	0.850	0.592	0.582	0.537	0.862	0.612	0.594

2. Non-dense anchoring

Config	Backbone	Lr schd	Base lr	imgs_p_gpu	img_scale	anchor_sc	4tiles	mAP	AP@0.5	AP@0.75	AR	Tr.mAP	Tr.AP@0.5	Tr.AP@0.75	Tr.AR
GA-RetinaNet-r50-fpn	r50	1x	0.001	2	(816, 1088)	4 (octave)	☐	0.523	0.870	0.579	0.583	0.532	0.881	0.590	0.591
GA-RetinaNet-x101-32x4d-fpn	x101-32x4d	1x	0.001	2	(816, 1088)	4 (octave)	☐	0.537	0.882	0.602	0.598	0.552	0.896	0.623	0.610
RepPoints-moment-r50-fpn	r50	1x	0.02	6	(816, 1088)	4 (base)	☐	0.505	0.815	0.578	0.562	0.519	0.820	0.601	0.574

3. Comparison of different anchor scales for Faster-RCNN

Config	Backbone	Lr schd	Base lr	imgs_p_gpu	img_scale	anchor_sc	mAP	AP@0.5	AP@0.75	AR	Tr.mAP	Tr.AP@0.5	Tr.AP@0.75	Tr.AR
Faster-RCNN-r50-fpn	r50	1x	0.005	2	(816, 1088)	[8]	0.522	0.850	0.591	0.577	0.534	0.862	0.611	0.590
Faster-RCNN-r50-fpn	r50	1x	0.005	2	(816, 1088)	[4]	0.551	0.912	0.614	0.613	0.567	0.926	0.636	0.629
Faster-RCNN-r50-fpn	r50	1x	0.005	2	(816, 1088)	[3]	0.549	0.911	0.611	0.614

4. Comparison of different anchor scales for RetinaNet

Config	Backbone	Lr schd	Base lr	imgs_p_gpu	img_scale	anchor_sc	mAP	AP@0.5	AP@0.75	AR	Tr.mAP	Tr.AP@0.5	Tr.AP@0.75	Tr.AR
RetinaNet-r50-fpn	r50	1x	0.001	2	(1333, 800)	4 (octave)	0.463	0.751	0.532	0.512	0.467	0.752	0.535	0.516
RetinaNet-r50-fpn	r50	1x	0.001	2	(1333, 800)	3 (octave)	0.508	0.849	0.564	0.569	0.513	0.853	0.574	0.574

5. Bells and whistles testing

Config	Backbone	Lr schd	Base lr	imgs_p_gpu	img_scale	anchor_sc	4tiles	s-nms test	extra augs	traintime flip	testtime flip	mAP	AP@0.5	AP@0.75	AR
Faster-RCNN-r50-fpn	r50	1x	0.005	2	(752, 1024), (816, 1088), (880, 1152)	[4]	☐	☐	☐	✓	☐	0.552	0.912	0.615	0.616
Faster-RCNN-r50-fpn	r50	1x	0.005	2	(816, 1088)	[4]	☐	☐	✓	☐	☐	0.548	0.911	0.608	0.612
Faster-RCNN-r50-fpn	r50	2x	0.005	2	(816, 1088)	[4]	☐	☐	✓	✓	☐	0.540	0.906	0.596	0.606
Faster-RCNN-r50-fpn	r50	2x	0.005	2	(816, 1088)	[4]	☐	☐	✓	✓	✓	0.510	0.888	0.543	0.584

6. Cascade-RCNN comparison

Config	Backbone	Lr schd	Base lr	imgs_p_gpu	img_scale	anchor_sc	4tiles	s-nms test	mAP	AP@0.5	AP@0.75	AR	Tr.mAP	Tr.AP@0.5	Tr.AP@0.75	Tr.AR
Cascade-RCNN-r50-fpn	r50	1x	0.005	2	(816, 1088)	[8]	☐	☐	0.525	0.840	0.604	0.582	0.542	0.862	0.647	0.596
Cascade-RCNN-r50-fpn	r50	1x	0.005	2	(816, 1088)	[4]	☐	☐	0.553	0.902	0.626	0.615	0.574	0.926	0.653	0.634
Cascade-RCNN-r50-fpn	r50	1x	0.005	2	(816, 1088)	[4]	☐	✓	0.556	0.900	0.632	0.622	0.577	0.925	0.659	0.642
Cascade-RCNN-x101-32x4d-fpn	x101-32x4d	1x	0.005	2	(768, 1024)	[4]	☐	☐	0.556	0.903	0.629	0.617	0.583	0.929	0.665	0.640
Cascade-RCNN-x101-32x4d-fpn	x101-32x4d	1x	0.005	2	(768, 1024)	[4]	☐	✓	0.560	0.902	0.635	0.623	0.585	0.929	0.672	0.647

7. Tiling strategies

Config	Backbone	Lr schd	Base lr	imgs_p_gpu	img_scale	anchor_sc	4tiles	s-nms test	mAP	AP@0.5	AP@0.75	AR
Faster-RCNN-r50-fpn (w/o merging)	r50	1x	0.005	2	(816, 1088)	[8]	✓	☐	0.561	0.912	0.632	0.628
Faster-RCNN-r50-fpn (w/o merging)	r50	1x	0.005	2	(816, 1088)	[4]	✓	☐	0.566	0.928	0.636	0.636
Faster-RCNN-r50-fpn (merged)	r50	1x	0.005	2	(816, 1088)	[4]	✓	☐	0.547	0.894	0.615	0.611
Faster-RCNN-r50-fpn (full frame)	r50	1x	0.005	2	(816, 1088)	[4]	✓	✓	0.577	0.928	0.659	0.654

Citation

Feel free to cite my report if you use any of the results for benchmarking in your work.

@misc{kozlov2020working,
    title={Working with scale: 2nd place solution to Product Detection in Densely Packed Scenes [Technical Report]},
    author={Artem Kozlov},
    year={2020},
    eprint={2006.07825},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

About

Working with scale: 2nd place solution to Product Detection in Densely Packed Scenes

Apache License 2.0

Languages

Language:Python 88.8%Language:Cuda 6.6%Language:C++ 4.5%Language:Shell 0.1%Language:Dockerfile 0.0%