BEV-Guided Multi-Modality Fusion for Driving Perception

BEV-Guided Multi-Modality Fusion for Driving Perception
Yunze Man, Liang-Yan Gui, Yu-Xiong Wang
https://yunzeman.github.io/BEVGuide/

Abstract: Integrating multiple sensors and addressing diverse tasks in an end-to-end algorithm are challenging yet critical topics for autonomous driving. To this end, we introduce BEVGuide, a novel Bird's Eye-View (BEV) representation learning framework, representing the first attempt to unify a wide range of sensors under direct BEV guidance in an end-to-end fashion. Our architecture accepts input from a diverse sensor pool, including but not limited to Camera, Lidar and Radar sensors, and extracts BEV feature embeddings using a versatile and general transformer backbone. We design a BEV-guided multi-sensor attention block to take queries from BEV embeddings and learn the BEV representation from sensor-specific features. BEVGuide is efficient due to its lightweight backbone design and highly flexible as it supports almost any input sensor configurations. Extensive experiments demonstrate that our framework achieves exceptional performance in BEV perception tasks with a diverse sensor set.

Official PyTorch implementation coming soon

YunzeMan / BEVGuide

BEV-Guided Multi-Modality Fusion for Driving Perception

About