chaytonmin / Occ-BEV

Multi-Camera Unified Pre-training via 3D Scene Reconstruction for DETR3D, BEVFormer, BEVDet, BEVDepth and Semantic Occupancy Prediction

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Occ-BEV: Multi-Camera Unified Pre-training via 3D Scene Reconstruction

(for DETR3D, BEVFormer, BEVDet, BEVDepth and Semantic Occupancy Prediction)

Paper in arXiv

Abstract

Multi-camera 3D perception has emerged as a prominent research field in autonomous driving, offering a viable and cost-effective alternative to LiDAR- based solutions. However, existing multi-camera algorithms primarily rely on monocular image pre-training, which overlooks the spatial and temporal correlations among different camera views. To address this limitation, we propose the first multi-camera unified pre-training framework called Occ-BEV, which involves initially reconstructing the 3D scene as the foundational stage and subsequently fine-tuning the model on downstream tasks. Specifically, a 3D decoder is designed for leveraging Bird’s Eye View (BEV) features from multi-view images to predict the 3D geometry occupancy to enable the model to capture a more comprehensive understanding of the 3D environment. One significant advantage of Occ-BEV is that it can utilize a vast amount of unlabeled image-LiDAR pairs for pre-training. The proposed multi-camera unified pre-training framework demonstrates promising results in key tasks such as multi-camera 3D object detection and semantic scene completion. When compared to monocular pre-training methods on the nuScenes dataset, Occ-BEV demonstrates a significant improvement of 2.0% in mAP and 2.0% in NDS for 3D object detection, as well as a 3% increase in mIOU for semantic scene completion.

Methods

method

Getting Started

Model Zoo

Backbone Method Pre-training Lr Schd NDS mAP Config
R101-DCN BEVFormer ImageNet 24ep 47.7 37.7 config/[model]
R101-DCN BEVFormer ImageNet + Occ-BEV 24ep 50.0 39.7 config/[model]
R101-DCN BEVFormer FCOS3D 24ep 51.7 41.6 config/model
R101-DCN BEVFormer FCOS3D + Occ-BEV 24ep 53.4 43.8 config/pre-trained model/log

Bibtex

If this work is helpful for your research, please consider citing the following BibTeX entry.

@article{occ-bev,
  title={Occ-BEV: Multi-Camera Unified Pre-training via 3D Scene Reconstruction},
  author={Chen Min, Xinli Xu, Dawei Zhao, Liang Xiao, Yiming Nie, and Bin Dai}
  journal={arXiv preprint},
  year={2023}
}

Acknowledgement

Many thanks to these excellent open source projects:

About

Multi-Camera Unified Pre-training via 3D Scene Reconstruction for DETR3D, BEVFormer, BEVDet, BEVDepth and Semantic Occupancy Prediction

License:MIT License


Languages

Language:Python 99.8%Language:Shell 0.2%