SCD: A Stacked Carton Dataset for Detection and Segmentation

Jinrong Yang¹ Shengkai Wu¹ Lijun Gou¹ Hangcheng Yu¹ Chenxi Lin¹ Jiazhuo Wang¹ Pan Wang¹ Minxuan Li² Xiaoping Li¹

¹State Key Laboratory of Digital Manufacturing Equipment and Technology, Huazhong University of Science and Technology, China.
²Faculty of Arts and Science, Queen’s University, Canada

Abstract | Paper | SCD | Network | Leaderboard | Attention

1. Abstract

Carton detection is an important technique in the automatic logistics system and can be applied to many applications such as the stacking and unstacking of cartons, the unloading of cartons in the containers. However, there is no public large-scale carton dataset for the research community to train and evaluate the carton detection models up to now, which hinders the development of carton detection. In this paper, we present a large-scale carton dataset named Stacked Carton Dataset(SCD) with the goal of advancing the state-of-the-art in carton detection. Images are collected from the internet and several warehourses, and objects are labeled using per-instance segmentation for precise localization. There are totally 250,000 instance masks from 16,136 images. In addition, we design a carton detector based on RetinaNet by embedding Boundary Guided Supervision module(BGS) and Offset Prediction between Classification and Localization module(OPCL). OPCL alleviates the imbalance problem between classification and localization quality which boosts AP by 3.1% ~ 4.7% on SCD while BGS guides the detector to pay more attention to boundary information of cartons and decouple repeated carton textures. To demonstrate the generalization of OPCL to other datasets, we conduct extensive experiments on MS COCO and PASCAL VOC. The improvements of AP on MS COCO and PASCAL VOC are 1.8% ~ 2.2% and 3.4% ~ 4.3% respectively.

2. Paper

Paper on arXiv => "SCD: A Stacked Carton Dataset for Detection and Segmentation"

3 SCD

3.1 Dataset license

CC BY-NC-SA 4.0

3.2 Image examples

3.3 Annotations

Example of instance annotation in SCD. The first line represents the style of four labels with respect to LSCD while the second line illustrates the style of one label in OSCD. In terms of first line, blue, green, red and yellow represent Carton-inner-all, Carton-innerocclusion, Carton-outer-al and Carton-outer-occlusion respectively.

3.4 Overview infomation of SCD

Dataset	Images	Split(training/test set)	Labels	All/Occlusion	Inner/Outer	Total Instances	Average Instances
LSCD	7,735	6,735/1,000	4&1	√	√	81,870	10.58
OSCD	8,401	7,401/1,000	1	×	×	168,748	20.09

3.5 Data classification and download link

OSCD:

(1) OSCD => "Images and COCO-style labels" (password: d8sj)

Google Drive link: https://drive.google.com/file/d/1YeZ4mg_qZ4dBvKKfgGF8RQcyOMNoMp37/view?usp=sharing

LSCD:

(1) LSCD => "Images and COCO-style labels(containing Carton-inner-all, Carton-inner-occlusion, Carton-outer-all and Carton-outer-occlusion)" (password: 7ebi)

Google Drive link: https://drive.google.com/file/d/1nJKS5YwsfxciRXg9ZooGcgdoXuKESMaj/view?usp=sharing

(2) LSCD => "Images and COCO-style labels(only containing carton)" (password: vrqn)

Google Drive link: https://drive.google.com/file/d/1JRk_YjPpGcTCB-bvlJ37KsI5Sx1tfbUc/view?usp=sharing

*Notice: You should download the dataset using Baidu Drive, the password has been released, now we also provide Google Drive link! You can email us if any question, we will replay you within 3 days.(panwang725@hust.edu.cn, yangjinrong@hust.edu.cn)

3.6 Dataset statistics

The first line represents the statistical distribution of LSCD while the second line represents the statistical distribution of OSCD. The chart calculates the width, height, aspect ratio, pixel area and the number of objects in each image from left to right. Noting that the width, height and area of instance are all normalized by the width and height of corresponding image. Log function is adopted to normalize aspect ratio.

4. Proposed baseline method on SCD

4.1 RetinaNet with OPCL and BGS

4.2 Baseline

Dataset	Labels	Model(training/test set)	mAP	AP50	AP75
OSCD	1	RetinaNet	72.1	90.8	80.5
OSCD	1	RetinaNet+	76.6	91.8	83.6
OSCD	1	FCOS	72.8	91.1	80.6
OSCD	1	Faster R-CNN	69.0	90.1	77.8
LSCD	1	RetinaNet	79.8	95.2	87.9
LSCD	1	RetinaNet+	84.7	95.8	89.8
LSCD	1	FCOS	76.5	93.7	84.3
LSCD	1	Faster R-CNN	77.5	94.5	86.3
LSCD	4	RetinaNet	65.7	80.4	73.0
LSCD	4	RetinaNet+	69.9	80.0	74.9
LSCD	4	FCOS	68.1	81.2	74.8
LSCD	4	Faster R-CNN	61.2	79.5	70.1
LSCD+OSCD	1	RetinaNet	82.0	95.9	89.8
LSCD+OSCD	1	RetinaNet+	86.1	96.3	91.2
LSCD+OSCD	1	FCOS	83.8	96.2	90.4
LSCD+OSCD	1	Faster R-CNN	80.6	95.7	89.2
LSCD+OSCD	4	RetinaNet	67.4	80.8	74.1
LSCD+OSCD	4	RetinaNet+	71.5	80.9	76.4
LSCD+OSCD	4	FCOS	71.1	82.0	76.8
LSCD+OSCD	4	Faster R-CNN	64.7	81.2	73.7

Comparison of detection performance between three state-ofthe- art methods on SCD. For the evaluation of LSCD, 1 and 4 labels are all evaluated. LSCD+OSCD means detector are firstly pre-trained in OSCD and then finetuned in LSCD. RetinaNet+ represents GIoU loss is used.

4.3 Main results

Main results of RetinaNet with all our proposed modules. ”pretrain” means pretraining identity model on OSCD and fine-tuning on LSCD with the image scale of [600,1000]([800,1333]†). ”1x” means the model is trained for total 12 epochs.

5. Leaderboard

SCD-Leaderboard

If you have been successful in creating a model based on the training set and it performs well on the validation set, we encourage you to run your model on the test set. You can submit your results on the SCD leaderboard by creating a new issue. Your results will be ranked in the leaderboard and to benchmark your approach against that of other machine learners. We are looking forward to your submission. Please click here to submit.

6. ATTN

The data set is free for academic use but please do not use it for commercial purposes. You can run them at your own risk. For other purposes, please contact the corresponding author Pan Wang or Jinrong Yang (panwang725@hust.edu.cn, yangjinrong@hust.edu.cn).

About

SCD: A Stacked Carton Dataset for Detection and Segmentation

Apache License 2.0