CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers [CORL2022]

This is the official implementation of CoRL2022 paper "CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers". Runsheng Xu, Zhengzhong Tu, Hao Xiang, Wei Shao, Bolei Zhou, Jiaqi Ma

UCLA, UT-Austin

Overview of CoBEVT

Introduction

CoBEVT is the first generic multi-agent multi-camera perception framework that can cooperatively generate BEV map predictions. The core component of CoBEVT, named fused axial attention or FAX module, can capture sparsely local and global spatial interactions across views and agents. We achieve SOTA performance both on OPV2V and nuScenes dataset with real-time performance.

nuScenes demo: Our CoBEVT can be used on single-vehicle multi-camera semantic BEV Segmentations.

OPV2V demo: Our CoBEVT can also be used for multi-agent BEV map prediction.

Installation

The pipeline for nuScenes dataset and OPV2V dataset is different. Please refer to the specific folder for more details based on your research purpose.

👉 nuScenes Users
👉 OPV2V Users

Citation

@inproceedings{xu2022cobevt,
 author = {Runsheng Xu, Zhengzhong Tu, Hao Xiang, Wei Shao, Bolei Zhou, Jiaqi Ma},
 title = {CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers},
 booktitle={Conference on Robot Learning (CoRL)},
 year = {2022}}

Acknowledgement

CoBEVT is build upon OpenCOOD, which is the first Open Cooperative Detection framework for autonomous driving.

Our nuScenes experiments used the training pipeline in CVT(CVPR2022).

About

[CoRL2022] CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers

Apache License 2.0

Languages

Language:Python 99.1%Language:Cython 0.9%