Real-time 3D Object Detection using Feature Map Flow

Abstract

Three-dimensional object detection and tracking from point clouds is an important aspect in autonomous driving tasks for robots and vehicles where objects can be represented as 3D boxes. Accurate understanding of the surrounding environment is critical for successful autonomous driving. In this paper, we present an approach of considering time-spatial feature map aggregation from different time steps of deep neural model inference (named feature map flow, FMF). We propose several versions of the FMF: from common concatenation to context-based feature map fusion and odometry usage for previous feature map affine transform. Proposed approach significantly improves the quality of 3D detection and tracking baseline. Our centerbased model achieved better performance on the nuScenes benchmark for both 3D detection and tracking, with 3-4% mAP, 1-2% NDS, and 1-3% AMOTA higher than a baseline state-of-the-art model. We performed a software implementation of the proposed method optimized for the NVidia Jetson AGX Xavier single-board computer with point cloud processing speed of 6-9 FPS.

Main Results

3D detection on nuScenes test set

	MAP ↑	NDS ↑	FPS ↑
VoxelNet	58.0	65.9	17
PointPillars	53.8	62.7	29

3D Tracking on nuScenes test set

	AMOTA ↑	IDS ↓
VoxelNet	61.2	870
PointPillars	58.1	736

3D detection on Waymo test set

	Veh ↑	Ped ↑	Cyc ↑	mAPH ↑	Latency,ms
VoxelNet	70.74	65.46	67.63	67.95	86.11
PointPillars (FP16)	69.65	54.61	62.28	62.18	62.30

3D detection on Waymo Val set

	Veh ↑	Ped ↑	Cyc ↑	mAPH ↑	Latency,ms
VoxelNet	66.68	63.62	68.64	65.98	86.11
VoxelNet (FP16)	66.68	63.06	67.24	65.85	77.53
PointPillars	62.35	59.67	66.71	62.43	82.06
PointPillars (FP16)	61.75	58.12	65.26	62.19	62.30

All results are tested on a RTX 3060 ti GPU with batch size 1.

Use FMFNet

Follow the provided steps to reproduce our results on nuScenes validation and test sets and get pretrained models.

Please refer to INSTALL to run the docker container for FMFNet. For training and testing on nuScenes, please follow the instructions in START. For WAYMO dataset, you can check START_WAYMO

Lisence

FMFNet is released under MIT license (see LICENSE). It is developed based on a forked version of CenterPoint. We also used code from det3d, CenterNet and CenterTrack.

Contact

Questions and suggestions are welcome!

Youshaa Murhij yosha.morheg@phystech.edu

About

Recurrent Center-based 3D Object Detection Pipeline.

MIT License

Languages

Language:Python 88.2%Language:Cuda 6.9%Language:C++ 4.6%Language:Dockerfile 0.2%Language:Shell 0.2%