This repository contains the official implementation of PEGG-Net from the paper:
PEGG-Net: Pixel-Wise Efficient Grasp Generation in Complex Scenes
Haozhe Wang, Zhiyang Liu, Lei Zhou, Huan Yin and Marcelo H. Ang Jr.
[Paper][Demo Video]
Please clone this GitHub repo before proceeding with the installation.
git clone https://github.com/HZWang96/PEGG-Net.git
The code was tested on Ubuntu 18.04, with Python 3.6 and PyTorch 1.7.0 (CUDA 11.0). NVIDIA GPUs are needed for both training and testing.
-
Create a new conda environment
conda create --name peggnet python=3.6
-
Install PyTorch 1.7.0 for CUDA 11.0
conda install pytorch==1.7.0 torchvision==0.8.0 torchaudio==0.7.0 cudatoolkit=11.0 -c pytorch
-
Install the required Python packages
pip install -r requirements.txt
-
Install the NVIDIA container toolkit
-
Pull the PyTorch 1.7.0 docker image from docker hub
docker pull pytorch/pytorch:1.7.0-cuda11.0-cudnn8-devel
-
Run the following command to start the docker container
nvidia-docker run --gpus all --ipc host -it -v <path/to/local/directory>:<workspace/in/docker/container> pytorch/pytorch:1.7.0-cuda11.0-cudnn8-devel bash
-
Configure the docker container by running the following commands:
chmod 755 docker_config ./docker_config
-
Download and extract the Cornell Grasping Dataset.
-
Download and extract the Jacquard Dataset.
-
For the Cornell and Jacquard dataset, the folders containing the images and labels should be arranged in the following manner:
PEGG-Net | - - data `- - | - - cornell | `- - | - - 01 | | - - 02 | | - - 03 | | - - 04 | | - - 05 | | - - 06 | | - - 07 | | - - 08 | | - - 09 | | - - 10 | ` - - backgrounds ` - - jacquard ` - - | - - Jacquard_Dataset_0 | - - Jacquard_Dataset_1 | - - Jacquard_Dataset_2 | - - Jacquard_Dataset_3 | - - Jacquard_Dataset_4 | - - Jacquard_Dataset_5 | - - Jacquard_Dataset_6 | - - Jacquard_Dataset_7 | - - Jacquard_Dataset_8 | - - Jacquard_Dataset_9 | - - Jacquard_Dataset_10 ` - - Jacquard_Dataset_11
-
For the Cornell Grasping Dataset. convert the PCD files (pcdXXXX.txt) to depth images by running
python -m utils.dataset_preprocessing.generate_cornell_depth data/cornell
Run train.py --help
to see the full list of options and description for each option.
.A basic example would be:
python train.py --description <write a description> --network peggnet --dataset cornell --dataset-path data/cornell --use-rgb 1 --use-depth 0
For training on an image-wise split using the Cornell dataset:
python train.py --description peggnet_iw_rgb_304 --network peggnet --dataset cornell --dataset-path data/cornell --image-wise --use-depth 0 --use-rgb 1 --num-workers 4 --input-size 304
Some important flags are:
--dataset
to select the dataset you want to use for training.--dataset-path
to provide the path to the selected dataset.--random-seed
to train the network using an image-wise split.--augment
to use random rotations and zooms to augment the dataset.--input-size
to change the size of the input image. Note that the input image must be a multiple of 8--use-rgb
to use RGB images during training. Set 1 for true and 0 for false.--use-depth
to use depth images during training. Set 1 for true and 0 for false.
To train on the Cornell Grasping Dataset using only RGB or depth images, you can use the default hyperparameters and include the --augment
flag.
For training on the Cornell Grasping Dataset using the image-wise split, add the --image-wise
flag. The random seed (--random-seed
) used for shuffling the dataset is 10.
When training using the Jacquard dataset, do not use the --augment
flag.
To train on the Jacquard Grasping Dataset using only RGB or depth images, you can use the default hyperparameters without the --augment
flag.
To train on the Cornell Grasping Dataset or the Jacquard dataset using RGB-D images, change the following hyperparameters:
- Set
--lr 0.01
- Set
--lr-step 25,40
The trained models will be stored in the output/models
directory. The TensorBoard log files for each training session will be stored in the tensorboard
directory.
Run eval.py --help
to see the full list of options and description for each option.
Some important flags are:
--iou-eval
to evaluate using the IoU between grasping rectangles metric--jacquard-ouptut
to generate output files in the format required for simulated testing against the Jacquard dataset.--vis
to plot the network output and predicted grasp rectangles
A basic example would be:
python eval.py --network <path to trained network> --dataset jacquard --dataset-path data/jacquard --jacquard-output --iou-eval
Connect network of the inference PC to the Movo2 PC and set the Movo2 PC as ROS Master.
Bring up RGB-D aligned realsense camera ROS node:
roslaunch realsense2_camera rs_aligned_depth.launch
Or bring up realsense camera ROS node for depth-only prediction:
roslaunch realsense2_camera rs_camera.launch
To publish tf info of right end-effector in right_base_link frame and calibrated camera extrinsics:
movo_tf_publisher/right_base_link.py
movo_tf_publisher/camera_calibration.py
To implement prediction with RGB-D input and send results to the control system:
python pegg_rgbd_prediction.py
Or to implement prediction with depth-only input and send results to the control system:
python pegg_d_prediction.py
To start control system:
python pegg_movo_control.py
If you find our work useful for your research, please consider citing the following BibTeX entry:
@misc{wang2023peggnet,
title={PEGG-Net: Pixel-Wise Efficient Grasp Generation in Complex Scenes},
author={Haozhe Wang and Zhiyang Liu and Lei Zhou and Huan Yin and Marcelo H Ang Jr au2},
year={2023},
eprint={2203.16301},
archivePrefix={arXiv},
primaryClass={cs.CV}
}