BEVPerception-Survey-Recipe

Awesome BEV perception papers and cookbook for achieving SOTA results

Overview of BEV Perception

The general picture of BEV perception at a glance, where consists of three sub-parts based on the input modality. BEV perception is a general task built on top of a series of fundamental tasks. For better completeness of the whole perception algorithms in autonomous driving, we list other topics as well.

Datasets of BEV Perception

Academic Summary of BEV Perception

Important methods in recent years about BEV perception, including different modalities and tasks.

Important methods performance in recent years about BEV perception, including different settings and leaderboards.

BEV Camera

A general pipeline in BEV Camera

And related literature.

Lift, Splat, Shoot: Encoding Images from Arbitrary Camera Rigs by Implicitly Unprojecting to 3D / paper / project / ECCV 2020 / LSS
BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View / paper / project / arXiv / BEVDet
BEVDet4D: Exploit Temporal Cues in Multi-camera 3D Object Detection / paper / project / arXiv / BEVDet4D
BEVDepth: Acquisition of Reliable Depth for Multi-view 3D Object Detection / paper / project / arXiv / BEVDepth
DSGN: Deep Stereo Geometry Network for 3D Object Detection / paper / supplemental / project / CVPR 2020
LIGA-Stereo: Learning LiDAR Geometry Aware Representations for Stereo-Based 3D Detector / paper / supplemental / project / ICCV 2021
Is Pseudo-Lidar Needed for Monocular 3D Object Detection? / paper / supplemental / project / ICCV 2021
Inverse perspective mapping simplifies optical flow computation and obstacle detection / paper / ? / IPM
Deep Learning based Vehicle Position and Orientation Estimation via Inverse Perspective Mapping Image / paper / IV 2019
Learning to Map Vehicles into Bird’s Eye View / ICIAP 2017
Monocular 3D Vehicle Detection Using Uncalibrated Traffic Cameras through Homography / paper / IROS 2021
Driving Among Flatmobiles: Bird-Eye-View Occupancy Grids From a Monocular Camera for Holistic Trajectory Planning / paper / WACV 2021
Understanding Bird’s-Eye View of Road Semantics Using an Onboard Camera / paper / project / IEEE ROBOTICS AND AUTOMATION LETTERS 2022
Automatic dense visual semantic mapping from street-level imagery / paper / IEEE/RSJ International Conference on Intelligent Robots and Systems 2012
Stacked Homography Transformations for Multi-View Pedestrian Detection / paper / ICCV 2021
Cross-View Semantic Segmentation for Sensing Surroundings / paper / project / IEEE Robotics and Automation Letters 2020
FISHING Net: Future Inference of Semantic Heatmaps In Grids / paper / arXiv
NEAT: Neural Attention Fields for End-to-End Autonomous Driving / paper / project / ICCV 2021
Projecting Your View Attentively: Monocular Road Scene Layout Estimation via Cross-View Transformation / paper / project / CVPR 2021
Bird’s-Eye-View Panoptic Segmentation Using Monocular Frontal View Images / paper / project / IEEE Robotics and Automation Letters 2022
BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers / paper / project / ECCV 2022
PersFormer: 3D Lane Detection via Perspective Transformer and the OpenLane Benchmark / paper / project / ECCV 2022
PETR: Position Embedding Transformation for Multi-View 3D Object Detection / paper / project / ECCV 2022
DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries / paper / project / PMLR 2022
Translating Images into Maps / paper / project / ICRA 2022
GitNet: Geometric Prior-based Transformation for Birds-Eye-View Segmentation / paper / ECCV 2022
PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images / paper / project / arXiv
ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection / paper / supplemental / project / WACV 2022
MV-FCOS3D++: Multi-View Camera-Only 4D Object Detection with Pretrained Monocular Backbones / paper / project / arXiv
FIERY: Future Instance Prediction in Bird's-Eye View From Surround Monocular Cameras / paper / supplemental / paper / ICCV 2021
BEVerse: Unified Perception and Prediction in Birds-Eye-View for Vision-Centric Autonomous Driving / paper / project / arXiv

BEV LiDAR

A general pipeline in BEV Camera

And related literature.

VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection / paper / supplemental / VoxelNet
SECOND: Sparsely Embedded Convolutional Detection / paper / project / Sensors 2018 / SECOND
Center-Based 3D Object Detection and Tracking / paper / supplemental / project / CVPR 2021 / CenterPoint
PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection / paper / project / CVPR 2020 / PV-RCNN
PV-RCNN++: Point-Voxel Feature Set Abstraction With Local Vector Representation for 3D Object Detection / paper / project / arXiv / PV-RCNN++
Structure Aware Single-Stage 3D Object Detection From Point Cloud / paper / project / CVPR 2020 / SA-SSD
Voxel R-CNN: Towards High Performance Voxel-based 3D Object Detection / paper / project / AAAI 2021 / Voxel R-CNN
Object DGCNN: 3D Object Detection using Dynamic Graphs / paper / NeurIPS 2021 / DGCNN
Voxel Transformer for 3D Object Detection paper / ICCV 2021 / VoTr
Embracing Single Stride 3D Object Detector With Sparse Transformer / paper / supplemental / project / CVPR 2022 / SST
AFDetV2: Rethinking the Necessity of the Second Stage for Object Detection from Point Clouds / paper / AAAI 2022 / AFDetV2
PointPillars: Fast Encoders for Object Detection From Point Clouds / paper / CVPR 2019 / PointPillars

BEV Fusion

BEV Fusion related literature

Unifying Voxel-based Representation with Transformer for 3D Object Detection / paper / project / arXiv
MVFuseNet: Improving End-to-End Object Detection and Motion Forecasting Through Multi-View Fusion of LiDAR Data / paper / CVPR 2021 / MVFuseNet
UniFormer: Unified Multi-view Fusion Transformer for Spatial-Temporal Representation in Bird's-Eye-View / paper / arXiv
BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation / paper / project / arXiv / BEVFusion
BEVFusion: A Simple and Robust LiDAR-Camera Fusion Framework / paper / project / arXiv / BEVFusion

Industrial Roadmap of BEV Perception

Practical Recipe of BEV Perception

BEV Camera

BEV LiDAR

Conventional Methods Camera 3D Object Detection

Monocular 3D Object Detection for Autonomous Driving / paper / CVPR 2016 / Mono3D
3D Bounding Box Estimation Using Deep Learning and Geometry / paper / CVPR 2017 / Deep3DBox
3D-RCNN: Instance-Level 3D Object Reconstruction via Render-and-Compare / paper / video / project / CVPR 2018 / 3D-RCNN
Objects as Points / paper / project / arXiv / CenterNet
Pseudo-LiDAR From Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving / paper / supplemental / project / CVPR 2019 / Pseudo-Lidar
M3D-RPN: Monocular 3D Region Proposal Network for Object Detection / paper / video / project / ICCV 2019 / M3D-RPN
Monocular 3D Object Detection Leveraging Accurate Proposals and Shape Reconstruction / paper / supplemental / project / CVPR 2019 / MonoPSR
Orthographic Feature Transform for Monocular 3D Object Detection / paper / project / arXiv / OFTNet
ROI-10D: Monocular Lifting of 2D Detection to 6D Pose and Metric Shape / paper / supplemental / CVPR 2019 / ROI-10D
SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint Estimation / paper / project / CVPR 2020 / SMOKE
Categorical Depth Distribution Network for Monocular 3D Object Detection / paper / supplemental / project / CVPR 2021 / CaDDN
FCOS3D: Fully Convolutional One-Stage Monocular 3D Object Detection / paper / supplemental / project / ICCV 2021 / FCOS3D
FCOS: Fully Convolutional One-Stage Object Detection / paper / project / ICCV 2019 / FCOS
Probabilistic and Geometric Depth: Detecting Objects in Perspective / paper / project / ? / PGD

Conventional Methods LiDAR Detection

Deep Hough Voting for 3D Object Detection in Point Clouds / paper / supplemental / video / project / ICCV 2019 / VoteNet
PointRCNN: 3D Object Proposal Generation and Detection From Point Cloud / paper / project / CVPR 2019 / PointRCNN
From Points to Parts: 3D Object Detection From Point Cloud With Part-Aware and Part-Aggregation Network / paper / project / TPAMI 2021 / Part-$A^2$
H3DNet: 3D Object Detection Using Hybrid Geometric Primitives / paper / project / ECCV 2020 / H3DNet
3D Object Detection With Pointformer / paper / supplemental / project / CVPR 2021 / Pointformer
Back-Tracing Representative Points for Voting-Based 3D Object Detection in Point Clouds / paper / project / CVPR 2021 / BRNet
Group-Free 3D Object Detection via Transformers / paper / supplemental / project / ICCV 2021 / Group-Free
RBGNet: Ray-Based Grouping for 3D Object Detection / paper / supplemental / project / CVPR 2022 / RBGNet
3DSSD: Point-Based 3D Single Stage Object Detector / paper / project / CVPR 2020 / 3DSSD

Conventional Methods LiDAR Segmentation

PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation / paper / supplemental / video / project / CVPR 2017 / PointNet
PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space / paper / project / NIPS 2017 / PointNet++
SpiderCNN: Deep Learning on Point Sets with Parameterized Convolutional Filters / paper / project / ECCV 2018 / SpiderCNN
Dynamic Graph CNN for Learning on Point Clouds / paper / project / ACM Transactions on Graphics 2019 / DGCNN
KPConv: Flexible and Deformable Convolution for Point Clouds / paper / supplemental / project / ICCV 2019 / KPConv
Point Transformer / paper / supplemental / project / ICCV 2021 / Point Transformer
RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds / paper / supplemental / project / CVPR 2020 / RandLA-Net
PolarNet: An Improved Grid Representation for Online LiDAR Point Clouds Semantic Segmentation / paper / video / project / CVPR 2020 / PolarNet
Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR Segmentation / paper / project / CVPR 2021 / Cylinder3D
(AF)2-S3Net: Attentive Feature Fusion With Adaptive Feature Selection for Sparse Semantic Segmentation Network / paper / CVPR 2021 / (AF)$^2$-S3Net
TORNADO-Net: mulTiview tOtal vaRiatioN semAntic segmentation with Diamond inceptiOn module / paper / ICRA 2021 / TornadoNet
AMVNet: Assertion-based Multi-View Fusion Network for LiDAR Semantic Segmentation / paper / arXiv /AMVNet
DRINet: A Dual-Representation Iterative Learning Network for Point Cloud Segmentation / paper / ICCV 2021 /DRINet
DRINet++: Efficient Voxel-as-point Point Cloud Segmentation / paper / arXiv / DRINet++
Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution / paper / project / ECCV 2020 / SPVConv
RPVNet: A Deep and Efficient Range-Point-Voxel Fusion Network for LiDAR Point Cloud Segmentation / paper / ICCV 2021 / RPVNet
Learning 3D Semantic Segmentation with only 2D Image Supervision / paper / 3DV 2021 / 2D3DNet
2DPASS: 2D Priors Assisted Semantic Segmentation on LiDAR Point Clouds / paper / project / ECCV2022 / 2DPASS

Conventional Methods Sensor Fusion

MVX-Net: Multimodal VoxelNet for 3D Object Detection / paper / ICRA 2019 / MVX-Net
Multi-Task Multi-Sensor Fusion for 3D Object Detection / paper / CVPR 2019 / MMF
Deep Continuous Fusion for Multi-Sensor 3D Object Detection / paper / ECCV 2018 / ContFuse
PointAugmenting: Cross-Modal Augmentation for 3D Object Detection / paper / project / CVPR 2021 / PointAugmenting
AutoAlignV2: Deformable Feature Aggregation for Dynamic Multi-Modal 3D Object Detection / paper / project / ECCV 2022 / AutoAlignV2
DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection / paper / supplemental / project / CVPR 2022 / DeepFusion
CenterFusion: Center-Based Radar and Camera Fusion for 3D Object Detection / paper / project / WACV 2021 / CenterFusion
FUTR3D: A Unified Sensor Fusion Framework for 3D Detection / paper / arXiv / FUTR3D
TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection With Transformers / paper / supplemental / project / CVPR 2022 / TransFusion
DeepInteraction: 3D Object Detection via Modality Interaction / paper / project / arXiv / DeepInteraction
PointPainting: Sequential Fusion for 3D Object Detection / paper / supplemental / CVPR 2020 / PointPainting
Frustum PointNets for 3D Object Detection From RGB-D Data / paper / supplemental / project / CVPR 2018 / F-PointNet
Multi-View 3D Object Detection Network for Autonomous Driving / paper / video / CVPR 2017 / MV3D
Joint 3D Proposal Generation and Object Detection from View Aggregation / paper / project / IROS 2018 / AVOD
CLOCs: Camera-LiDAR Object Candidates Fusion for 3D Object Detection / paper / project / IROS 2020 / CLOCs

franter666 / BEVPerception-Survey-Recipe

BEVPerception-Survey-Recipe

Overview of BEV Perception

Datasets of BEV Perception

Academic Summary of BEV Perception

BEV Camera

BEV LiDAR

BEV Fusion

Industrial Roadmap of BEV Perception

Practical Recipe of BEV Perception

BEV Camera

BEV LiDAR

Conventional Methods Camera 3D Object Detection

Conventional Methods LiDAR Detection

Conventional Methods LiDAR Segmentation

Conventional Methods Sensor Fusion

About