robot-grasping clutter-scene robot-manipulation computer-vision deep-learning

Robot Grasping in Dense Clutter

Overview

This repository contains an implementation of our proposed algorithm for grasp detection in dense clutter. The algorithm consists of three steps: instance segmentation, view-based experience transfer and optimal grasp determination.

Instance Segmentation - Mask R-CNN is adopted to segment easy-to-grasp objects from a clutter scene.
View-based Experience Transfer - Denoise Autoencoder is used to estimate corresponding view of each segmented object. Then, grasp experiences can be transfered onto a clutter scene.

The system consisting of a six-axis robot arm with two-jaw parallel gripper and Kinect V2 RGB-D camera is used to evaluate the success rate for grasping in dense clutter. The grasping results on cluttered metal parts show that the success rate is about 94%.

Demonstration of the hand-eye system and the algorithm

Demonstration of two types of grasping methods

For more information about our approach, please check out our summary video and our paper:

Robot Grasping in Dense Clutter via View-Based Experience Transfer

Jen-Wei Wang and Jyh-Jone Lee

Contact

If you have any questions, please mail to Jen-Wei Wang

Quick Start

To run this code, please navigate to algorithm.

cd algorithm

Installation

This code was developed with Python 3.5 on Ubuntu 16.04 and NVIDIA 1080ti. Python requirements can be installed by:

pip install -r requirements.txt

There are two pre-trained models:

Mask R-CNN can be downloaded at here
Denoise Autoencoder is included in three files named as chkpt-80000.

Evaluation

Testing images are provided at test_images. Run our code on testing images:

python detection_algorithm.py --rgb=./test_images/rgb.png --depth=./test_images/depth.png

Testing results will be saved at test_images.

Clutter Scene	Segmentation	Collision-Free Grasps	Optimal Grasps

config.yaml contains some parameters than can be adjusted.

Mask R-CNN

The code for Mask R-CNN is based on repository implemented by matterport.

Dataset

The annotated dataset is provided at here. We improve the data-collecting and labeling process. The details are shown in following figure and our paper.

Train

To prevent over-fitting, the revisions are:

Use pre-trained ResNet-50 as backbone
Fine-tune parameters not in the backbone
Reduce types of anchors from 5 to 3

Evaluation

The results of mAP for RGB and RGB-D image as two types of inputs are respectively, 0.901 and 0.924.

Denoise Autoencoder

The code for Denoise Autoencoder is based on repository implemented by DLR-RM. To estimate views more correctly, we redefine the loss function as L2 loss plus perceptual loss. The details are shown in following figure and our paper.

Dataset

The object views and their corresponding grasp experiences are provided at here

Train

Download pre-trained vgg-16 model at here and put the model to the directory.
Put our provided files in denoise_ae folder to the same directory.
Start training with the same process explained in the repository.

Evaluation

The recall of pose estimation on T-LESS dataset is about 50.31.

About

This package is an implementation of view-based experience transfer for robot grasping in dense clutter.

robot-grasping clutter-scene robot-manipulation computer-vision deep-learning

Languages

Language:Python 100.0%