Transformation-Equivariant 3D Object Detection for Autonomous Driving

This is a improved version of TED by a multiple refinement design. This code is mainly based on OpenPCDet and CasA, some codes are from PENet and SFD.

Detection Framework

The overall detection framework is shown below. (1) Transformation-equivariant Sparse Convolution (TeSpConv) backbone; (2) Transformation-equivariant Bird Eye View (TeBEV) pooling; (3) Multi-grid pooling and multi-refinement. TeSpConv applies shared weights on multiple transformed point clouds to record the transformation-equivariant voxel features. TeBEV pooling aligns and aggregates the scene-level equivariant features into lightweight representations for proposal generation. Multi-grid pooling and multi-refinement align and aggregate the instance-level invariant features for proposal refinement.

Model Zoo

We release two models, which are based on LiDAR-only and multi-modal data respectively. We denoted the two models as TED-S and TED-M respectively.

All models are trained with 8 V100 GPUs and are available for download.
The models are trained with train split (3712 samples) of KITTI dataset
The results are the 3D AP(R40) of Car on the val set of KITTI dataset.
These models are not suitable to directly report results on KITTI test set, please use slightly lower score threshold and train the models on all or 80% training data to achieve a desirable performance on KITTI test set.

	Modality	GPU memory of training	Easy	Mod.	Hard	download
TED-S	LiDAR only	~12 GB	93.25	87.99	86.28	google / baidu(p91t) / 36M
TED-M	LiDAR+RGB	~15 GB	95.62	89.24	86.77	google / baidu(nkr5) / 65M

Getting Started

conda create -n spconv2 python=3.9
conda activate spconv2
pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html
pip install numpy==1.19.5 protobuf==3.19.4 scikit-image==0.19.2 waymo-open-dataset-tf-2-5-0 nuscenes-devkit==1.0.5 spconv-cu111 numba scipy pyyaml easydict fire tqdm shapely matplotlib opencv-python addict pyquaternion awscli open3d pandas future pybind11 tensorboardX tensorboard Cython prefetch-generator

Dependency

Our released implementation is tested on.

Ubuntu 18.04
Python 3.6.9
PyTorch 1.8.1
Spconv 1.2.1
NVIDIA CUDA 11.1
8x Tesla V100 GPUs

We also tested on.

Ubuntu 18.04
Python 3.9.13
PyTorch 1.8.1
Spconv 2.1.22 # pip install spconv-cu111
NVIDIA CUDA 11.1
2x 3090 GPUs

Prepare dataset

Please download the official KITTI 3D object detection dataset and organize the downloaded files as follows (the road planes could be downloaded from [road plane], which are optional for data augmentation in the training):

TED
├── data
│   ├── kitti
│   │   │── ImageSets
│   │   │── training
│   │   │   ├──calib & velodyne & label_2 & image_2 & (optional: planes)
│   │   │── testing
│   │   │   ├──calib & velodyne & image_2
├── pcdet
├── tools

You need creat a 'velodyne_depth' dataset to run our multimodal detector: You can download our preprocessed data from google (13GB), baidu (a20o), or generate the data by yourself:

Install this project.
Download the PENet depth completion model here (500M) and put it into tools/PENet.
Then run the following code to generate RGB pseudo points.

cd tools/PENet
python3 main.py --detpath [your path like: ../../data/kitti/training]

After 'velodyne_depth' generation, run following command to creat dataset infos:

cd ../..
python3 -m pcdet.datasets.kitti.kitti_dataset create_kitti_infos tools/cfgs/dataset_configs/kitti_dataset.yaml
python3 -m pcdet.datasets.kitti.kitti_dataset_mm create_kitti_infos tools/cfgs/dataset_configs/kitti_dataset.yaml

Anyway, the data structure should be:

TED
├── data
│   ├── kitti
│   │   │── ImageSets
│   │   │── training
│   │   │   ├──calib & velodyne & label_2 & image_2 & (optional: planes) & velodyne_depth
│   │   │── testing
│   │   │   ├──calib & velodyne & image_2 & velodyne_depth
│   │   │── gt_database
│   │   │── gt_database_mm
│   │   │── kitti_dbinfos_train_mm.pkl
│   │   │── kitti_dbinfos_train.pkl
│   │   │── kitti_infos_test.pkl
│   │   │── kitti_infos_train.pkl
│   │   │── kitti_infos_trainval.pkl
│   │   │── kitti_infos_val.pkl
├── pcdet
├── tools

Installation

git clone https://github.com/hailanyi/TED.git
cd TED
python3 setup.py develop

Training

Single GPU train:

cd tools
python3 train.py --cfg_file ${CONFIG_FILE}

For example, if you train the TED-S model:

cd tools
python3 train.py --cfg_file cfgs/models/kitti/TED-S.yaml

Multiple GPU train:

You can modify the gpu number in the dist_train.sh and run

cd tools
sh dist_train.sh

The log infos are saved into log.txt You can run cat log.txt to view the training process.

Evaluation

cd tools
python3 test.py --cfg_file ${CONFIG_FILE} --batch_size ${BATCH_SIZE} --ckpt ${CKPT}

For example, if you test the TED-S model:

cd tools
python3 test.py --cfg_file cfgs/models/kitti/TED-S.yaml --ckpt TED-S.pth

Multiple GPU test: you need modify the gpu number in the dist_test.sh and run

sh dist_test.sh

The log infos are saved into log-test.txt You can run cat log-test.txt to view the test results.

License

This code is released under the Apache 2.0 license.

Acknowledgement

Citation

@inproceedings{TED,
    title={Transformation-Equivariant 3D Object Detection for Autonomous Driving},
    author={Wu, Hai and Wen, Chenglu and Li, Wei and Yang, Ruigang and Wang, Cheng},
    year={2023},
    booktitle={AAAI}
    
}

hailanyi / TED