kirumang / Pix2Pose

Original implementation of the paper "Pix2Pose: Pixel-Wise Coordinate Regression of Objects for 6D Pose Estimation", in ICCV 2019, https://arxiv.org/abs/1908.07433

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Pix2Pose

Original implementation of the paper, Kiru Park, Timothy Patten and Markus Vincze, "Pix2Pose: Pix2Pose: Pixel-Wise Coordinate Regression of Objects for 6D Pose Estimation", ICCV 2019, https://arxiv.org/abs/1908.07433

Notice

Codes that have been used to produce results for the BOP challenge 2020 are updated. Thanks to PBR training images provided by the challenge, the results of LM-O, HB, and ITODD are significantly improved.

The modifications from the original implementation of the paper are follows,

  1. Replaced the encoder part with the first three blocks of Resnet-50 with pre-trained weights using ImageNet.

  2. Increased a threshold for inlier pixels during PnP-Ransac operation (3 -> 5).

  3. Detection results from Mask-RCNN are reused if predictions for each detection are not successful. In this case, Pix2Pose is performed for other objects that do not have good results yet.

  4. A minor bug that causes bad detection results for the T-Less dataset is fixed. (different image resolutions were used during training and inference)

  5. Increased the number of RPN proposals and NMS thresholds in Mask-RCNN (1000/0.7 to 2000/0.9), which produces more detection proposals

(w/ICP)

  1. Parameters for the ICP refinement are optimized.

  2. Adjusted inlier and outlier thresholds for Pix2Pose (inlier: 0.15 -> 0.2, outlier: [0.15,0.25,0.35] -> [0.2,0.3,0.35]).

  3. A score of each hypothesis is computed by a new form, max(0,0.2-[depth_difference per pixel])/0.2, instead of counting the number of pixels that have less than 0.2 depth differences.

The official results are:

BOP Score'20 AVG LM-O T-Less TUD-L IC-BIN ITODD HB YCB-V
Pix2Pose(RGB + Depth ICP) 0.591 0.588 0.512 0.820 0.390 0.351 0.695 0.780
Pix2Pose(RGB only) 0.342 0.363 0.344 0.420 0.226 0.134 0.446 0.457
Vidal-Sensors18 (the best in '18,'19) 0.569 0.582 0.538 0.876 0.393 0.435 0.706 0.450
CosyPose-ICP (ECCV'20, the best in '20) 0.698 0.714 0.701 0.939 0.647 0.313 0.712 0.861

PBR Training images are used for LM-O, IC-BIN, ITODD, HB without additional images, and real training images are used for T-Less, TUD-L, YCB-V. To reproduce the same results, cfg/cfg_bop_2020.json or cfg/cfg_bop_2020_rgb.json (for RGB only results) has to be used with our up-to-date code.

Requirements:

For detection pipelines,

  • Keras implementation of Mask-RCNN: used for LineMOD in the paper and all datasets in the BOP Challenge,
git clone https://github.com/matterport/Mask_RCNN.git
  • Keras implementation of Retinanet: used for evaluation of the T-Less dataset in the paper
git clone https://github.com/fizyr/keras-retinanet.git

Citation

If you use this code, please cite the following

@InProceedings{Park_2019_ICCV,
author = {Park, Kiru and Patten, Timothy and Vincze, Markus},
title = {Pix2Pose: Pix2Pose: Pixel-Wise Coordinate Regression of Objects for 6D Pose Estimation},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
month = {Oct},
year = {2019}
}

Run the recognition for BOP datasets

The original codes are updated to support the format of the most recent 6D pose benchmark, BOP: Benchmark for 6D Object Pose Estimation

  1. Download a dataset from the BOP website and extract files in a folder
    • e.g.) <path_to_dataset>/<dataset_name>
    • For the recognition, "Base archive", "Object models", and "Test images" have to be downloaded at least.
  2. Download and extract weights in the same dataset folder used in 1.
  3. Make sure the directories follows the structure below.
    • <path_to_dataset>/<dataset_name>/models or model_eval or model_recont..: model directory that contains .ply files of models
    • <path_to_dataset>/<dataset_name>/models_xyz: norm_factor.json and .ply files of colorized 3d models
    • <path_to_dataset>/<dataset_name>/weight_detection: weight files for the detection
    • <path_to_dataset>/<dataset_name>/pix2pose_weights/<obj_name>/inference.hdf5 : weight files for each objects
  4. Set config file
    1. Set directories properly based on your environment
    2. For the bop challenge dataset: <path_to_src>/cfg/cfg_bop2019.json
    3. Use trained weights for the paper: <path_to_src>/cfg/cfg_<dataset_name>_paper.json (e.g., cfg_tless_paper.json)
    4. score_type: 1-scores from a 2D detetion pipeline is used (used for the paper), 2-scores are caluclated using detection score+overlapped mask (only supported for Mask RCNN, used for the BOP challenge)
    5. task_type : 1 - SiSo task (2017 BOP Challenge), 2 - ViVo task (2019 BOP challenge format)
    6. cand_factor: a factor for the number of detection candidates
  5. Execute the script
python3 tools/5_evaluation_bop_basic.py <gpu_id> <cfg_path> <dataset_name>

to run with the 3D-ICP refinement,

python3 tools/5_evaluation_bop_icp3d.py <gpu_id> <path_cfg_json> <dataset_name>
  1. The output will be stored in the 'path_to_output' in csv format, which can be used to calculate metric using bop_toolkit.

Important Note Differ from the paper, we used multiple outlier thresholds in the second stage for the BOP challenge since it is not allowed to have different parameters for each object or each dataset. This can be done easily by set the "outlier_th" in a 1D-array (refer to cfg_bop2019.json). In this setup, the best result, which has the largest inlier points, will be derived during estimation after applying all values in the second stage. To reproduce the results in the paper with fixed outlier threshold values, a 2D-array should be given as in "cfg_tless_paper.json")

(Optional) Environment setup using Docker

  1. Build Dockerfile docker build -t <container_name> .
  2. Start the container with
nvidia-docker run -it -v <dasetdir>:/bop -v <detection_repo>:<detection_dir> -v <other_dir>:<other_dir> <container_name> bash

ROS interface (tested with ROS-Kinetic)

  • Install ros_numpy: pip3 install ros_numpy
  • To Run the ROS interface with our Python 3.5 code (since ROS-Kinectic uses python 2.7), we need a trick to run ROS node. For example,
export PYTHONPATH=/usr/local/lib/python3.5/dist-packages:$PYTHONPATH(including other ROS related pathes)
  • The first path can be replaced with the dist-packages folder in the virtual environment. Thus, libraries will be loaded from python3.5 path, while loading ros related packages (rospy) from ros library directories in python 2.7.
  • You have to specify the topic for RGB images + camera instrinsics in "ros_config.json" file. For more detail, please check out ros_api_manual
  • ICP refinement when the depth image topic is available.
  • Current ros_config.json is to detect and estimate pose of YCB-Video objects. Download trained weights of YCB-V dataset to run this example.

Training for a new dataset

We assume the dataset is organized in the BOP 2019 format. For a new dataset (not in the BOP), modify bop_io.py properly to provide proper directories for training. Theses training codes are used to prepare and train the network for the BOP 2019.

1. Convert 3D models to colored coodinate models

python3 tools/2_1_ply_file_to_3d_coord_model <cfg_path> <dataset_name>

The file converts 3D models and save them to the target folder with a dimension information in a file, "norm_factor.json".

2. Render and generate training pairs

python3 tools/2_2_render_pix2pose_training.py <cfg_path> <dataset_name>

3. Train pix2pose network for each object

python3 tools/3_train_pix2pose.py <cfg_path> <dataset_name> <obj_name> [background_img_folder]

4. Convert the last wegiht file to an inference file.

python3 tools/4_convert_weights_inference.py <pix2pose_weights folder>

This program looks for the last weight file in each directory

5. [Optional] Training of 2D detection pipelines (if required, skip this when you have your own 2D detection pipeline)

(1) Generation of images for 2D detection training
python3 tools/1_1_scene_gen_for_detection.py <cfg_path> <dataset_name> <mask=1(true)/0(false)>

Output files

  • a number of augmented images using crops of objects in training images
  • For Mask-RCNN: /mask/*.npy files
  • For Retinanet(Keras-retinanet): gt.csv / label.csv
  • Generated images will be saved in "<path_to_dataset>/<dataset_name>/train_detect/"
(2) Train Mask-RCNN or Keras-Retinanet

To train Mask-RCNN, the pre-trained weight for the MS-COCO dataset should be place in <path/to/Mask-RCNN>/mask_rcnn_coco.h5.

python3 tools/1_2_train_maskrcnn.py <cfg_path> <dataset_name>

or Train Keras-retinanet using the script in the repository. It is highly recommended to initialize the network using the weights trained for the MS-COCO dataset. link

keras_retinanet/bin/train.py csv <path_to_dataset>/gt.csv <path_to_dataset>/label.csv --freeze-backbone --weights resnet50_coco_best_v2.1.0.h5

After training, the weights should be converted into inference model by,

keras_retinanet/bin/convert_model.py /path/to/training/model.h5 /path/to/save/inference/model.h5

Disclaimers:


Download trained weights

  • Please refer to the paper for other details regarding the training

    • T-Less: 2D Retinanet weights + Pix2Pose weights link
      • Given real training images are used for training (primesense)
      • reconstructed models are used to calculate VSD scores.
      • To test using all test images, download and copy all_target_tless.json file into the dataset folder (together with the test_targets_bop19.json file)

Download: trained weights for the BOP challenge 2020

These trained weights here are used to submit the results of core datasets in the BOP Challenge 2020.

First of all, norm_factors have to be downloded and placed in the following path: [path/to/bop_dataset]/[dataset_name]/models_xyz/norm_factor.json

Download link: Norm_factor files for all 7 dataset

Download the zip files and extract them to the bop dataset folder e.g., for TLess, the extracted files should be placed in

  • [path to bop dataset]/tless/weight_detection/tless20190927T0827/mask_rcnn_tless_0005.h5
  • [path to bop dataset]/tless/pix2pose_weights/[obj_no]

Contributors:

About

Original implementation of the paper "Pix2Pose: Pixel-Wise Coordinate Regression of Objects for 6D Pose Estimation", in ICCV 2019, https://arxiv.org/abs/1908.07433

License:MIT License


Languages

Language:Python 98.7%Language:Shell 1.2%Language:Dockerfile 0.1%