ldkong1205/RoboBEV

RoboBEV: Towards Robust Bird's Eye View Detection under Corruptions

Shaoyuan Xie Lingdong Kong Wenwei Zhang Jiawei Ren Liang Pan Kai Chen Ziwei Liu

About

RoboBEV is the first robustness evaluation benchmark tailored for camera-based bird's eye view (BEV) detection under natural corruptions. It includes eight corruption types that are likely to appear in driving scenarios, ranging from ¹sensor failure, ²motion & data processing, ³lighting conditions, and ⁴weather conditions.


FRONT_LEFT	FRONT	FRONT_RIGHT	FRONT_LEFT	FRONT	FRONT_RIGHT


BACK_LEFT	BACK	BACK_RIGHT	BACK_LEFT	BACK	BACK_RIGHT

Visit our project page to explore more examples. 🚙

Updates

[2023.02] - The nuScenes-C dataset is pending release for a careful check of potential IP issues.
[2023.01] - Launch of the RoboBEV benchmark! In this initial version, we include 11 camera-only BEV detection algorithms (22 variants), evaluated with 8 corruption types across 3 severity levels.

Installation

Kindly refer to INSTALL.md for the installation details.

Data Preparation

Kindly refer to DATA_PREPARE.md for the details to prepare the nuScenes and nuScenes-C datasets.

Getting Started

Kindly refer to GET_STARTED.md to learn more usage about this codebase.

Taxonomy

Kindly refer to DEMO.md to explore more visual examples for each corruption type.

Model Zoo

Camera-Only BEV Detection

Fast-BEV, arXiv 2023. ^[Code]

PolarFormer, AAAI 2023. ^[Code]

BEVStereo, AAAI 2023. ^[Code]

BEVDepth, AAAI 2023. ^[Code]

MatrixVT, arXiv 2022. ^[Code]

Sparse4D, arXiv 2022. ^[Code]

SOLOFusion, arXiv 2022. ^[Code]

CrossDTR, arXiv 2022. ^[Code]

SRCN3D, arXiv 2022. ^[Code]

PolarDETR, arXiv 2022. ^[Code]

BEVerse, arXiv 2022. ^[Code]

M^2BEV, arXiv 2022. ^[Code]

ORA3D, BMVC 2022. ^[Code]

Graph-DETR3D, ACM MM 2022. ^[Code]

SpatialDETR, ECCV 2022. ^[Code]

PETR, ECCV 2022. ^[Code]

BEVFormer, ECCV 2022. ^[Code]

BEVDet, arXiv 2021. ^[Code]

DETR3D, CoRL 2021. ^[Code]

LiDAR-Camera Fusion BEV Detection

BEVDistill, ICLR 2023. ^[Code]

BEVFusion, ICRA 2023. ^[Code]

BEVFusion, NeurIPS 2022. ^[Code]

Robustness Benchmark

📊 Metrics: The nuScenes Detection Score (NDS) is consistently used as the main indicator for evaluating model performance in our benchmark. The following two metrics are adopted to compare between models' robustness:

mCE (the lower the better): The average corruption error (in percentage) of a candidate model compared to the baseline model, which is calculated among all corruption types across three severity levels.
mRR (the higher the better): The average resilience rate (in percentage) of a candidate model compared to its "clean" performance, which is calculated among all corruption types across three severity levels.

Symbol ^⭐ denotes the baseline model adopted in mCS calculation. For more detailed experimental results, please refer to docs/results.

Model	mCE (%) $\downarrow$	mRR (%) $\uparrow$	Clean	Cam Crash	Frame Lost	Color Quant	Motion Blur	Bright	Low Light	Fog	Snow
^⭐DETR3D	100.00	70.77	0.4224	0.2859	0.2604	0.3177	0.2661	0.4002	0.2786	0.3912	0.1913
DETR3D_CBGS	99.21	70.02	0.4341	0.2991	0.2685	0.3235	0.2542	0.4154	0.2766	0.4020	0.1925
BEVFormer_Small	101.23	59.07	0.4787	0.2771	0.2459	0.3275	0.2570	0.3741	0.2413	0.3583	0.1809
BEVFormer_Base	97.97	60.40	0.5174	0.3154	0.3017	0.3509	0.2695	0.4184	0.2515	0.4069	0.1857
PETR_R50-p4	111.01	61.26	0.3665	0.2320	0.2166	0.2472	0.2299	0.2841	0.1571	0.2876	0.1417
PETR_VoV-p4	100.69	65.03	0.4550	0.2924	0.2792	0.2968	0.2490	0.3858	0.2305	0.3703	0.2632
ORA3D	99.17	68.63	0.4436	0.3055	0.2750	0.3360	0.2647	0.4075	0.2613	0.3959	0.1898
BEVDet_R50	115.12	51.83	0.3770	0.2486	0.1924	0.2408	0.2061	0.2565	0.1102	0.2461	0.0625
BEVDet_R101	113.68	53.12	0.3877	0.2622	0.2065	0.2546	0.2265	0.2554	0.1118	0.2495	0.0810
BEVDet_SwinT	116.48	46.26	0.4037	0.2609	0.2115	0.2278	0.2128	0.2191	0.0490	0.2450	0.0680
BEVDepth_R50	110.02	56.82	0.4058	0.2638	0.2141	0.2751	0.2513	0.2879	0.1757	0.2903	0.0863
BEVerse_SwinT	110.67	48.60	0.4665	0.3181	0.3037	0.2600	0.2647	0.2656	0.0593	0.2781	0.0644
BEVerse_SwinS	117.82	49.57	0.4951	0.3364	0.2485	0.2807	0.2632	0.3394	0.1118	0.2849	0.0985
PolarFormer_R101	96.06	70.88	0.4602	0.3133	0.2808	0.3509	0.3221	0.4304	0.2554	0.4262	0.2304
PolarFormer_VoV	98.75	67.51	0.4558	0.3135	0.2811	0.3076	0.2344	0.4280	0.2441	0.4061	0.2468
SRCN3D_R101	99.67	70.23	0.4286	0.2947	0.2681	0.3318	0.2609	0.4074	0.2590	0.3940	0.1920
SRCN3D_VoV	102.04	67.95	0.4205	0.2875	0.2579	0.2827	0.2143	0.3886	0.2274	0.3774	0.2499
Sparse4D_R101	100.01	55.04	0.5438	0.2873	0.2611	0.3310	0.2514	0.3984	0.2510	0.3884	0.2259

BEVFusion_Cam	-	-	0.4121	-	-	-	-	-	-	-	-
BEVFusion_LiDAR	-	-	0.6928	-	-	-	-	-	-	-	-
BEVFusion_C+L	-	-	0.7138	-	-	-	-	-	-	-	-

BEV Model Calibration

Model	Pretrain	Temporal	Depth	CBGS	Backbone	Encoder_BEV	mCE (%)	mRR (%)	NDS
DETR3D	✓	✗	✗	✗	ResNet	Attention	100.00	70.77	0.4224
DETR3D_CBGS	✓	✗	✗	✓	ResNet	Attention	99.21	70.02	0.4341
BEVFormer_Small	✓	✓	✗	✗	ResNet	Attention	101.23	59.07	0.4787
BEVFormer_Base	✓	✓	✗	✗	ResNet	Attention	97.97	60.40	0.5174
PETR_R50-p4	✗	✗	✗	✗	ResNet	Attention	111.01	61.26	0.3665
PETR_VoV-p4	✓	✗	✗	✗	VoVNet_V2	Attention	100.69	65.03	0.4550
ORA3D	✓	✗	✗	✗	ResNet	Attention	99.17	68.63	0.4436
PolarFormer_R101	✓	✗	✗	✗	ResNet	Attention	96.06	70.88	0.4602
PolarFormer_VoV	✓	✗	✗	✗	VoVNet_V2	Attention	98.75	67.51	0.4558

SRCN3D_R101	✓	✗	✗	✗	ResNet	CNN+Attn.	99.67	70.23	0.4286
SRCN3D_VoV	✓	✗	✗	✗	VoVNet_V2	CNN+Attn.	102.04	67.95	0.4205
Sparse4D_R101	✓	✓	✗	✗	ResNet	CNN+Attn.	100.01	55.04	0.5438

BEVDet_R50	✗	✗	✓	✓	ResNet	CNN	115.12	51.83	0.3770
BEVDet_R101	✗	✗	✓	✓	ResNet	CNN	113.68	53.12	0.3877
BEVDet_SwinT	✗	✗	✓	✓	ResNet	Swin	116.48	46.26	0.4037
BEVDepth_R50	✗	✗	✓	✓	ResNet	CNN	110.02	56.82	0.4058
BEVerse_SwinT	✗	✗	✓	✓	ResNet	Swin	137.25	28.24	0.1603
BEVerse_SwinT	✗	✓	✓	✓	ResNet	Swin	110.67	48.60	0.4665
BEVerse_SwinS	✗	✗	✓	✓	ResNet	Swin	132.13	29.54	0.2682
BEVerse_SwinS	✗	✓	✓	✓	ResNet	Swin	117.82	49.57	0.4951

Note: Pretrain denotes models initialized from the FCOS3D checkpoint. Temporal indicates whether temporal information is used. Depth denotes models with an explicit depth estimation branch. CBGS highlight models use the class-balanced group-sampling strategy.

Create Corruption Set

You can manage to create your own "RoboBEV" corrpution sets! Follow the instructions listed in CREATE.md.

TODO List

Initial release. 🚀
Add scripts for creating common corruptions.
Add download link of nuScenes-C.
Add evaluation scripts on corruption sets.
...

Citation

If you find this work helpful, please kindly consider citing our paper:

@ARTICLE{xie2023robobev,
  title={RoboBEV: Towards Robust Bird's Eye View Detection under Corruptions},
  author={xxx},
  journal={arXiv preprint arXiv:23xx.xxxxx}, 
  year={2023},
}

License

This work is under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, while some specific operations in this codebase might be with other licenses. Please refer to LICENSE.md for a more careful check, if you are using our code for commercial matters.

Acknowledgements

To be updated.

About

RoboBEV: Robust Bird's Eye View Detection under Corruptions

Languages

Language:Python 73.4%Language:Jupyter Notebook 26.0%Language:Shell 0.4%Language:C++ 0.2%Language:Batchfile 0.0%Language:Makefile 0.0%Language:Dockerfile 0.0%Language:CSS 0.0%