2d-object-detection 3d-object-detection kitti-dataset

3D bounding box detection using deep learning and geomerty

Introduction | Demo | 2D Object Detection | 3D Object Detection | Contribution | References

Introduction

The primary purpose of this project is to implement the 3D object detection pipeline introduced in "3D Bounding Box Estimation Using Deep Learning and Geometry" paper for detecting four classes: Car, Truck, Pedestrian and Cyclist.

The paper mainly focuses on the 3D bounding box estimation from given 2D detections. However, we implemented both 2D and 3D detection parts in this project.

This repository is organized to make it reproducible, and two notebooks and the necessary configuration files are made available. The first Notebook is 2d-bounding-box dedicated to using transfer learning to create a 2D object detection on the Kitti Dataset. Tensorflow Object detection framework was used in this process. It includes:

Data preparation
Model Training
Model evaluation
Model saving

The second Notebook is 3d-bounding-box where we implemented the 3D bounding box detector using Tensorflow. The Notebook includes the following:

Data Preparation
3D model implementation
Input and Output Precessing
Using both the 2D and 3D models for 3D predictions.
Visually evaluating the 3D model's performance.
Measuring dimension accuracy*.

* Other estimations are evaluated before and after training models (2D and 3D). However, since the dimension vector is regressed through geometry constraints, it is evaluated separately.

Demo

The following images presents diffrent senarious and objects. At the left presented the result of 2D bounding box detection and on the right presented results of 3D bounding box detection based on the 2D detection result.

And this is a video composed of diffrent sequences that shows the prediction results on videos.

https://drive.google.com/file/d/14NsNLniK3n9FU8o_lDs-N5S0KtqVG0W6/view?usp=sharing

2D Object Detection

The purpose of 2D detection is after receiving an image; we need to detect Cars, Trucks, Pedestrians and Cyclists with 2D bounding boxes. We also need to classify the resulting patches as when we estimate the object dimensions later, we will require those classes.

Since 2D object detection is an already well-addressed problem, we keep things simple. Thus, we used high-level API (TensorFlow object detection API) and pre-trained models.

We have used Faster RCNN checkpoints faster_rcnn_resnet101_kitti from Tensorflow 1.x model zoo. The model is pre-trained on Kitty 2D objects dataset, but unfortunately, it does not detect all the classes. Having said that, We had to finetune the model to include all the classes that we needed, and the training process was straightforward. The following table summarizes the results after finetuning.

Train size: 6981 images.
Test size: 500 images.
IoU: 0.5

Model	Previous Dataset	Input Dimensions	Iterations	val AP Car	val AP Cyclist	val AP Pedestrian	val AP Truck	mAP
Faster RCNN (Resnet 101)	KITTI	3009933	25000	0.8590	0.6518	0.6079	0.7699	0.722

Training with the TensorFlow Object Detection API requires fixing some paths and training configurations. The pipeline.config file for Faster RCNN is included, and you can use it straight away as we tried to find the best configuration possible. Please make sure to modify the data paths inside that file before using it.

The complete workflow, such as setting up the environment, creating TFRecords, extracting frozen graphs, and evaluating the test set, is also provided in the 2d-object-detection notebook. We highly recommend using Google Colab and Google drive for training and hosting the data. The environment is slightly sensible and requires specific packages versions and a Linux environment.

3D Object Detection

For implementing this 3D object detection, We used some open-source repository as our baseline and guide. We focused on updating the code to make it compatible with the newer versions of TensorFlow and, more importantly, improve the model's accuracy. We also prepared well-documented notebooks to simplify understanding the model architecture and the modelling process and add many useful functions for visualization and testing purposes. The final results of the 3D detection are presented in the table below:

Train size: 19308 patches extracted directly from the dataset.
Test size: 3408 patches extracted directly from the dataset.
Confidence error metric: MSE
Dimension error metric: MSE
Angle error metric: 1 - cos(|pred| - true) (ideal would be 1 - cos(0) = 0)

Feature extractor	Previous Dataset	Input Dimensions	Epochs	# Parameters	Confidence Error	Angle error	Dimension Error	val Confidence Error	val Angle Error	val Dimension Error
Movilenetv2	Imagenet	2242243	50 with early stopping at epoch 28	66,490,453	8.7705e-04	0.2608	0.0391	0.0135	0.2672	0.0531

The following table present the Translation vector estimation accuracy.

Error metric: L2_norm(true - pred)
Normalized Error metric: L2_norm( (true - pred) / |true| )

Max Truncation	Max Occlusion	Min 2D Bbox Width	Min 2D Bbox Height	Final Sample Count	Average Error	Average Normalized Error *
0.15	1	60	60	967	1.8176	0.1151
0.40	1	40	40	1788	2.4165	0.1168

GhaziXX / 3d-bounding-box-detection

3D bounding box detection using deep learning and geomerty

Introduction | Demo | 2D Object Detection | 3D Object Detection | Contribution | References

Introduction

Demo

2D Object Detection

3D Object Detection

Contribution

References

Papers and Datasets

Codes and Tools

About

Languages