BraveGroup / FullySparseFusion

Fully Sparse Fusion for 3D Object Detection

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Fully Sparse Fusion for 3D Object Detection (TPAMI 2024)

arXiv

A multi-modal exploration on the paradigm of fully sparse 3D object detection


Installation

First initialize the conda environment

conda create -n FSF python=3.8 -y
conda activate FSF
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html

Then, install the mmdet3d

pip install mmcv-full==1.3.9
pip install mmdet==2.14.0
pip install mmsegmentation==0.30.0
# modified mmdet 3d
git clone https://gitee.com/liyingyanUCAS/mmdetection3d.git
cd mmdetection3d
pip install -v -e .
# some other packages
mkdir pkgs && cd pkgs
git clone https://github.com/Abyssaledge/TorchEx.git
pip install -v -e .
pip install spconv-cu114

Data Preparation

First, make the data dir

mkdir data

Then, please download the nuScenes and Argoverse 2 dataset and organize the data dir as follow:

├── data
|   ├── nuscenes
|   |   ├── samples
│   │   │   ├── CAM_BACK
│   │   │   ├── CAM_BACK_LEFT
│   │   │   ├── CAM_BACK_RIGHT
│   │   │   ├── CAM_FRONT
│   │   │   ├── CAM_FRONT_LEFT
│   │   │   ├── CAM_FRONT_RIGHT
│   │   │   ├── LIDAT_TOP
|   |   ├── sweeps
│   │   │   ├── CAM_BACK
│   │   │   ├── CAM_BACK_LEFT
│   │   │   ├── CAM_BACK_RIGHT
│   │   │   ├── CAM_FRONT
│   │   │   ├── CAM_FRONT_LEFT
│   │   │   ├── CAM_FRONT_RIGHT
│   │   │   ├── LIDAT_TOP
|   |   ├── v1.0-train
|   |   ├── v1.0-val
|   |   ├── v1.0-trainval
|   |   ├── nuscenes_infos_train.pkl
|   |   ├── nuscenes_infos_val.pkl
|   |   ├── nuscenes_infos_trainval.pkl
│   ├── argo2
│   │   │── argo2_format
│   │   │   │   │──sensor
│   │   │   │   │   │──train
│   │   │   │   │   │   │──...
│   │   │   │   │   │──val
│   │   │   │   │   │   │──...
│   │   │   │   │   │──test
│   │   │   │   │   │   │──0c6e62d7-bdfa-3061-8d3d-03b13aa21f68
│   │   │   │   │   │   │──0f0cdd79-bc6c-35cd-9d99-7ae2fc7e165c
│   │   │   │   │   │   │──...
│   │   │   │   │   │──val_anno.feather
│   │   │── kitti_format
│   │   │   │   │──argo2_infos_train.pkl
│   │   │   │   │──argo2_infos_val.pkl
│   │   │   │   │──argo2_infos_test.pkl
│   │   │   │   │──argo2_infos_trainval.pkl
│   │   │   │   │──training
│   │   │   │   │──testing
│   │   │   │   │──argo2_gt_database

For the argo2 pickles, you can either use the pickles we provided or generate them by yourself. If you want to generate them by yourself, please run the following commands:

python tools/AV2/argo2_pickle_mmdet_fusion.py

Please download the pretrained models and other files from Google Drive.

Then, please organize the ckpt dir as follow:

├── ckpt
|   ├── fsd_argo_pretrain.pth
|   ├── fsd_nusc_pretrain.pth
|   ├── htc_x101_64x4d_fpn_dconv_c3-c5_coco-20e_16x1_20e_nuim_20201008_211222-0b16ac4b.pth
|   ├── htc_x101_64x4d_fpn_dconv_c3-c5_coco-20e_16x1_20e_nuim.py

Then use our scripts for pre-infering and saving 2D mask

./tools/mask_tools/save_mask_nusc.sh
./tools/mask_tools/save_mask_argo2.sh

Train and Test

nuScenes

After the preparation, you can train our model with 8 GPUs on nuScenes using:

./tools/nusc_train.sh nuScenes/FSF_nuScenes_config 8

For testing, please run the command:

./tools/dist_test.sh projects/configs/nuScenes/FSF_nuScenes_config.py $CKPT_PATH$ 8

Argoverse 2

For training on Argoverse 2 with 8 GPUs, please using:

./tools/argo_train.sh Argoverse2/FSF_AV2_config 8

For testing, please run:

./tools/dist_test.sh projects/configs/Argoverse2/FSF_AV2_config.py $CKPT_PATH$ 8

Results

DATASET mAP NDS CDS
nuScenes 70.8 73.2 -
AV2 33.2 - 25.5

Citation

Please consider citing our work as follows if it is helpful.

@article{li2024fully,
  title={Fully sparse fusion for 3d object detection},
  author={Li, Yingyan and Fan, Lue and Liu, Yang and Huang, Zehao and Chen, Yuntao and Wang, Naiyan and Zhang, Zhaoxiang},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  year={2024},
  publisher={IEEE}
}

Acknowledgement

This project is based on the following codebases.

About

Fully Sparse Fusion for 3D Object Detection

License:MIT License


Languages

Language:Python 99.1%Language:Shell 0.9%