Yzichen / FlashOCC

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

FlashOcc: Fast and Memory-Efficient Occupancy Prediction via Channel-to-Height Plugin

Panoptic-FlashOcc: An Efficient Baseline to Marry Semantic Occupancy with Panoptic via Instance Center


* Please note that the FPS here is measured with A100 GPU (PyTorch fp32 backend).

News

arXiv arXiv

This repository is an official implementation of FlashOCC


and Panoptic-FlashOCC


Main Results

1. FlashOCC

Config Backbone Input
Size
mIoU FPS
(Hz)
Flops
(G)
Params
(M)
Model Log
BEVDetOCC (1f) R50 256x704 31.60 92.1 241.76 29.02 gdrive log
M0: FlashOCC (1f) R50 256x704 31.95 197.6 154.1 39.94 gdrive log
M1: FlashOCC (1f) R50 256x704 32.08 152.7 248.57 44.74 gdrive log
BEVDetOCC-4D-Stereo (2f) R50 256x704 36.1 - - - baidu log
M2:FlashOCC-4D-Stereo (2f) R50 256x704 37.84 - - - gdrive log
BEVDetOCC-4D-Stereo (2f) Swin-T 512x1408 42.0 - - - baidu log
M3:FlashOCC-4D-Stereo (2f) Swin-T 512x1408 43.52 - 1490.77 144.99 gdrive log

FPS are tested via TensorRT on 3090 with FP16 precision. Please refer to Tab.2 in paper for the detail model settings for M-number.

2. Panoptic-FlashOCC

In Panoptic-FlashOCC, we have made the following 3 adjustments to FlashOCC:

  • Without using camera mask for training. This is because its use significantly improves the prediction performance in the visible region, but at the expense of prediction in the invisible region.
  • Using category balancing.
  • Using stronger loss settings.
  • Introducing instance center for panoptic occupancy

More results for different configurations will be released soon.

Config Backbone Input
Size
RayIou RayPQ mIoU FPS
(Hz)
Flops
(G)
Params
(M)
Model Log
M1: FlashOCC (1f) R50 256x704 - - 15.41 - 248.57 44.74 gdrive log
Panoptic-FlashOCC-Depth-tiny (1f) R50 256x704 34.57 - 28.83 43.9 175.00 45.32 gdrive log
Panoptic-FlashOCC-Depth-tiny-Pano (1f) R50 256x704 34.81 12.9 29.14 39.8 175.00 45.32 gdrive log
Panoptic-FlashOCC-Depth (1f) R50 256x704 34.93 - 28.91 38.7 269.47 50.12 gdrive log
Panoptic-FlashOCC-Depth-Pano (1f) R50 256x704 35.22 13.2 29.39 35.2 269.47 50.12 gdrive log
Panoptic-FlashOCC-4D-Depth (2f) R50 256x704 35.99 - 29.57 35.9 - - gdrive log
Panoptic-FlashOCC-4D-Depth-Pano (2f) R50 256x704 36.76 14.5 30.31 30.4 - - gdrive log
Panoptic-FlashOCC-4DLongterm-Depth (8f) R50 256x704 38.51 - 31.49 35.6 - - gdrive log
Panoptic-FlashOCC-4DLongterm-Depth-Pano (8f) R50 256x704 38.50 16.0 31.57 30.2 - - gdrive log
  • Please note that the FPS here is measured with A100 GPU (PyTorch fp32 backend).

Get Started

  1. Environment Setup
  2. Model Training
  3. Quick Test Via TensorRT In MMDeploy
Backend mIOU FPS(Hz)
PyTorch-FP32 31.95 -
TRT-FP32 30.78 96.2
TRT-FP16 30.78 197.6
TRT-FP16+INT8(PTQ) 29.60 383.7
TRT-INT8(PTQ) 29.59 397.0
  1. Visualization
  • first row is our prediction and second row is gt.


A detail video can be found at baidu

  1. TensorRT Implement Writen In C++ With Cuda Acceleration

Acknowledgement

Many thanks to the authors of BEVDet, FB-BEV, RenderOcc and SparseBEV

Bibtex

If this work is helpful for your research, please consider citing the following BibTeX entry.

@article{yu2023flashocc,
      title={FlashOcc: Fast and Memory-Efficient Occupancy Prediction via Channel-to-Height Plugin}, 
      author={Zichen Yu and Changyong Shu and Jiajun Deng and Kangjie Lu and Zongdai Liu and Jiangyong Yu and Dawei Yang and Hui Li and Yan Chen},
      year={2023},
      eprint={2311.12058},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

About


Languages

Language:Python 94.0%Language:Cuda 4.1%Language:C++ 1.4%Language:Shell 0.2%Language:C 0.2%