3DTrans: An Open-source Codebase for Continuous Learning towards Autonomous Driving Task

3DTrans includes Transfer Learning Techniques and Scalable Pre-training Techniques for tackling the continuous learning issue on autonomous driving as follows.

We implement the Transfer Learning Techniques consisting of four functions:

Unsupervised Domain Adaptation (UDA) for 3D Point Clouds
Active Domain Adaptation (ADA) for 3D Point Clouds
Semi-Supervised Domain Adaptation (SSDA) for 3D Point Clouds
Multi-dateset Domain Fusion (MDF) for 3D Point Clouds

We implement the Scalable Pre-training which can continuously enhance the model performance for the downstream tasks, as more pre-training data are fed into our pre-training network:

Team Home:

A Team Home for Member Information and Profile, Project Link

Overview

News
Installation for 3DTrans
Getting Started
Transfer Learning Techniques@3DTrans
- Model Zoo:
  - Domain Transfer Results
Scalable Pre-training Techniques@3DTrans
- Model Zoo:
  - AD-PT Results
  - ReSimAD
Visualization Tools for 3DTrans
3DTrans Framework Introduction
Acknowledge
Citation

News 🔥

We expect this repository will inspire the research of 3D model generalization since it will push the limits of perceptual performance. 🗼

Installation for 3DTrans

You may refer to INSTALL.md for the installation of 3DTrans.

Getting Started

Getting Started for ALL Settings

Please refer to Readme for Datasets to prepare the dataset and convert the data into the 3DTrans format. Besides, 3DTrans supports the reading and writing data from Ceph Petrel-OSS, please refer to Readme for Datasets for more details.
Please refer to Readme for UDA for understanding the problem definition of UDA and performing the UDA adaptation process.
Please refer to Readme for ADA for understanding the problem definition of ADA and performing the ADA adaptation process.
Please refer to Readme for SSDA for understanding the problem definition of SSDA and performing the SSDA adaptation process.
Please refer to Readme for MDF for understanding the problem definition of MDF and performing the MDF joint-training process.
Please refer to Readme for ReSimAD for ReSimAD implementation.
Please refer to Readme for Scalable Pre-training for starting the journey of 3D perception model pre-training.

Model Zoo

We could not provide the Waymo-related pretrained models due to Waymo Dataset License Agreement, but you could easily achieve similar performance by training with the corresponding configs.

Domain Transfer Results

UDA Results

Here, we report the cross-dataset (Waymo-to-KITTI) adaptation results using the BEV/3D AP performance as the evaluation metric. Please refer to Readme for UDA for experimental results of more cross-domain settings.

All LiDAR-based models are trained with 4 NVIDIA A100 GPUs and are available for download.
For Waymo dataset training, we train the model using 20% data.
The domain adaptation time is measured with 4 NVIDIA A100 GPUs and PyTorch 1.8.1.
Pre-SN represents that we perform the SN (statistical normalization) operation during the pre-training source-only model stage.
Post-SN represents that we perform the SN (statistical normalization) operation during the adaptation stage.

	training time	Adaptation	Car@R40	download
PointPillar	~7.1 hours	Source-only with SN	74.98 / 49.31	-
PointPillar	~0.6 hours	Pre-SN	81.71 / 57.11	model-57M
PV-RCNN	~23 hours	Source-only with SN	69.92 / 60.17	-
PV-RCNN	~23 hours	Source-only	74.42 / 40.35	-
PV-RCNN	~3.5 hours	Pre-SN	84.00 / 74.57	model-156M
PV-RCNN	~1 hours	Post-SN	84.94 / 75.20	model-156M
Voxel R-CNN	~16 hours	Source-only with SN	75.83 / 55.50	-
Voxel R-CNN	~16 hours	Source-only	64.88 / 19.90	-
Voxel R-CNN	~2.5 hours	Pre-SN	82.56 / 67.32	model-201M
Voxel R-CNN	~2.2 hours	Post-SN	85.44 / 76.78	model-201M
PV-RCNN++	~20 hours	Source-only with SN	67.22 / 56.50	-
PV-RCNN++	~20 hours	Source-only	67.68 / 20.82	-
PV-RCNN++	~2.2 hours	Post-SN	86.86 / 79.86	model-193M

ADA Results

Here, we report the Waymo-to-KITTI adaptation results using the BEV/3D AP performance. Please refer to Readme for ADA for experimental results of more cross-domain settings.

All LiDAR-based models are trained with 4 NVIDIA A100 GPUs and are available for download.
For Waymo dataset training, we train the model using 20% data.
The domain adaptation time is measured with 4 NVIDIA A100 GPUs and PyTorch 1.8.1.

	training time	Adaptation	Car@R40	download
PV-RCNN	~23h@4 A100	Source Only	67.95 / 27.65	-
PV-RCNN	~1.5h@2 A100	Bi3D (1% annotation budget)	87.12 / 78.03	Model-58M
PV-RCNN	~10h@2 A100	Bi3D (5% annotation budget)	89.53 / 81.32	Model-58M
PV-RCNN	~1.5h@2 A100	TQS	82.00 / 72.04	Model-58M
PV-RCNN	~1.5h@2 A100	CLUE	82.13 / 73.14	Model-50M
PV-RCNN	~10h@2 A100	Bi3D+ST3D	87.83 / 81.23	Model-58M
Voxel R-CNN	~16h@4 A100	Source Only	64.87 / 19.90	-
Voxel R-CNN	~1.5h@2 A100	Bi3D (1% annotation budget)	88.09 / 79.14	Model-72M
Voxel R-CNN	~6h@2 A100	Bi3D (5% annotation budget)	90.18 / 81.34	Model-72M
Voxel R-CNN	~1.5h@2 A100	TQS	78.26 / 67.11	Model-72M
Voxel R-CNN	~1.5h@2 A100	CLUE	81.93 / 70.89	Model-72M

SSDA Results

We report the target domain results on Waymo-to-nuScenes adaptation using the BEV/3D AP performance as the evaluation metric, and Waymo-to-ONCE adaptation using ONCE evaluation metric. Please refer to Readme for SSDA for experimental results of more cross-domain settings.

The domain adaptation time is measured with 4 NVIDIA A100 GPUs and PyTorch 1.8.1.
For Waymo dataset training, we train the model using 20% data.
second_5%_FT denotes that we use 5% nuScenes training data to fine-tune the Second model.
second_5%_SESS denotes that we utilize the SESS: Self-Ensembling Semi-Supervised method to adapt our baseline model.
second_5%_PS denotes that we fine-tune the source-only model to nuScenes datasets using 5% labeled data, and perform the pseudo-labeling process on the remaining 95% unlabeled nuScenes data.

	training time	Adaptation	Car@R40	download
Second	~11 hours	source-only(Waymo)	27.85 / 16.43	-
Second	~0.4 hours	second_5%_FT	45.95 / 26.98	model-61M
Second	~1.8 hours	second_5%_SESS	47.77 / 28.74	model-61M
Second	~1.7 hours	second_5%_PS	47.72 / 29.37	model-61M
PV-RCNN	~24 hours	source-only(Waymo)	40.31 / 23.32	-
PV-RCNN	~1.0 hours	pvrcnn_5%_FT	49.58 / 34.86	model-150M
PV-RCNN	~5.5 hours	pvrcnn_5%_SESS	49.92 / 35.28	model-150M
PV-RCNN	~5.4 hours	pvrcnn_5%_PS	49.84 / 35.07	model-150M
PV-RCNN++	~16 hours	source-only(Waymo)	31.96 / 19.81	-
PV-RCNN++	~1.2 hours	pvplus_5%_FT	49.94 / 34.28	model-185M
PV-RCNN++	~4.2 hours	pvplus_5%_SESS	51.14 / 35.25	model-185M
PV-RCNN++	~3.6 hours	pvplus_5%_PS	50.84 / 35.39	model-185M

For Waymo-to-ONCE adaptation, we employ 8 NVIDIA A100 GPUs for model training.
PS denotes that we pseudo-label the unlabeled ONCE and re-train the model on pseudo-labeled data.
SESS denotes that we utilize the SESS method to adapt the baseline.
For ONCE, the IoU thresholds for evaluation are 0.7, 0.3, 0.5 for Vehicle, Pedestrian, Cyclist.

	Training ONCE Data	Methods	Vehicle@AP	Pedestrian@AP	Cyclist@AP	download
Centerpoint	Labeled (4K)	Train from scracth	74.93	46.21	67.36	model-96M
Centerpoint_Pede	Labeled (4K)	PS	-	49.14	-	model-96M
PV-RCNN++	Labeled (4K)	Train from scracth	79.78	35.91	63.18	model-188M
PV-RCNN++	Small Dataset (100K)	SESS	80.02	46.24	66.41	model-188M

MDF Results

Here, we report the Waymo-and-nuScenes consolidation results. The models are jointly trained on Waymo and nuScenes datasets, and evaluated on Waymo using the mAP/mAHPH LEVEL_2 and nuScenes using the BEV/3D AP. Please refer to Readme for MDF for more results.

All LiDAR-based models are trained with 8 NVIDIA A100 GPUs and are available for download.
The multi-domain dataset fusion (MDF) training time is measured with 8 NVIDIA A100 GPUs and PyTorch 1.8.1.
For Waymo dataset training, we train the model using 20% training data for saving training time.
PV-RCNN-nuScenes represents that we train the PV-RCNN model only using nuScenes dataset, and PV-RCNN-DM indicates that we merge the Waymo and nuScenes datasets and train on the merged dataset. Besides, PV-RCNN-DT denotes the domain attention-aware multi-dataset training.

Baseline	MDF Methods	Waymo@Vehicle	Waymo@Pedestrian	Waymo@Cyclist	nuScenes@Car	nuScenes@Pedestrian	nuScenes@Cyclist
PV-RCNN-nuScenes	only nuScenes	35.59 / 35.21	3.95 / 2.55	0.94 / 0.92	57.78 / 41.10	24.52 / 18.56	10.24 / 8.25
PV-RCNN-Waymo	only Waymo	66.49 / 66.01	64.09 / 58.06	62.09 / 61.02	32.99 / 17.55	3.34 / 1.94	0.02 / 0.01
PV-RCNN-DM	Direct Merging	57.82 / 57.40	48.24 / 42.81	54.63 / 53.64	48.67 / 30.43	12.66 / 8.12	1.67 / 1.04
PV-RCNN-Uni3D	Uni3D	66.98 / 66.50	65.70 / 59.14	61.49 / 60.43	60.77 / 42.66	27.44 / 21.85	13.50 / 11.87
PV-RCNN-DT	Domain Attention	67.27 / 66.77	65.86 / 59.38	61.38 / 60.34	60.83 / 43.03	27.46 / 22.06	13.82 / 11.52

Baseline	MDF Methods	Waymo@Vehicle	Waymo@Pedestrian	Waymo@Cyclist	nuScenes@Car	nuScenes@Pedestrian	nuScenes@Cyclist
Voxel-RCNN-nuScenes	only nuScenes	31.89 / 31.65	3.74 / 2.57	2.41 / 2.37	53.63 / 39.05	22.48 / 17.85	10.86 / 9.70
Voxel-RCNN-Waymo	only Waymo	67.05 / 66.41	66.75 / 60.83	63.13 / 62.15	34.10 / 17.31	2.99 / 1.69	0.05 / 0.01
Voxel-RCNN-DM	Direct Merging	58.26 / 57.87	52.72 / 47.11	50.26 / 49.50	51.40 / 31.68	15.04 / 9.99	5.40 / 3.87
Voxel-RCNN-Uni3D	Uni3D	66.76 / 66.29	66.62 / 60.51	63.36 / 62.42	60.18 / 42.23	30.08 / 24.37	14.60 / 12.32
Voxel-RCNN-DT	Domain Attention	66.96 / 66.50	68.23 / 62.00	62.57 / 61.64	60.42 / 42.81	30.49 / 24.92	15.91 / 13.35

Baseline	MDF Methods	Waymo@Vehicle	Waymo@Pedestrian	Waymo@Cyclist	nuScenes@Car	nuScenes@Pedestrian	nuScenes@Cyclist
PV-RCNN++ DM	Direct Merging	63.79 / 63.38	55.03 / 49.75	59.88 / 58.99	50.91 / 31.46	17.07 / 12.15	3.10 / 2.20
PV-RCNN++-Uni3D	Uni3D	68.55 / 68.08	69.83 / 63.60	64.90 / 63.91	62.51 / 44.16	33.82 / 27.18	22.48 / 19.30
PV-RCNN++-DT	Domain Attention	68.51 / 68.05	69.81 / 63.58	64.39 / 63.43	62.33 / 44.16	33.44 / 26.94	21.64 / 18.52

3D Pre-training Results

AD-PT Results on Waymo

AD-PT demonstrates strong generalization learning ability on 3D points. We first pre-train the 3D backbone and 2D backbone using the AD-PT on ONCE dataset (from 100K to 1M data), and fine-tune the model on different datasets. Here, we report the results of fine-tuning on Waymo.

	Data amount	Overall	Vehicle	Pedestrian	Cyclist
SECOND (From scratch)	3%	52.00 / 37.70	58.11 / 57.44	51.34 / 27.38	46.57 / 28.28
SECOND (AD-PT)	3%	55.41 / 51.78	60.53 / 59.93	54.91 / 45.78	50.79 / 49.65
SECOND (From scratch)	20%	60.62 / 56.86	64.26 / 63.73	59.72 / 50.38	57.87 / 56.48
SECOND (AD-PT)	20%	61.26 / 57.69	64.54 / 64.00	60.25 / 51.21	59.00 / 57.86
CenterPoint (From scratch)	3%	59.00 / 56.29	57.12 / 56.57	58.66 / 52.44	61.24 / 59.89
CenterPoint (AD-PT)	3%	61.21 / 58.46	60.35 / 59.79	60.57 / 54.02	62.73 / 61.57
CenterPoint (From scratch)	20%	66.47 / 64.01	64.91 / 64.42	66.03 / 60.34	68.49 / 67.28
CenterPoint (AD-PT)	20%	67.17 / 64.65	65.33 / 64.83	67.16 / 61.20	69.39 / 68.25
PV-RCNN++ (From scratch)	3%	63.81 / 61.10	64.42 / 63.93	64.33 / 57.79	62.69 / 61.59
PV-RCNN++ (AD-PT)	3%	68.33 / 65.69	68.17 / 67.70	68.82 / 62.39	68.00 / 67.00
PV-RCNN++ (From scratch)	20%	69.97 / 67.58	69.18 / 68.75	70.88 / 65.21	69.84 / 68.77
PV-RCNN++ (AD-PT)	20%	71.55 / 69.23	70.62 / 70.19	72.36 / 66.82	71.69 / 70.70

ReSimAD

ReSimAD Implementation

Here, we give the Download Link of our reconstruction-simulation dataset by the ReSimAD, consisting of nuScenes-like, KITTI-like, ONCE-like, and Waymo-like datasets that generate target-domain-like simulation points.

Specifically, please refer to ReSimAD reconstruction for the point-based reconstruction meshes, and PCSim for the technical details of simulating the target-domain-like points based on the reconstructed meshes. For perception module, please refer to PV-RCNN and PV-RCNN++ for model training and evaluation.

We report the zero-shot cross-dataset (Waymo-to-nuScenes) adaptation results using the BEV/3D AP performance as the evaluation metric for a fair comparison. Please refer to ReSimAD for more details.

Methods	training time	Adaptation	Car@R40	Ckpt
PV-RCNN	~23 hours	Source-only	31.02 / 17.75	Not Avaliable (Waymo License)
PV-RCNN	~8 hours	ST3D	36.42 / 22.99	-
PV-RCNN	~8 hours	ReSimAD	37.85 / 21.33	ReSimAD_ckpt
PV-RCNN++	~20 hours	Source-only	29.93 / 18.77	Not Avaliable (Waymo License)
PV-RCNN++	~2.2 hours	ST3D	34.68 / 17.17	-
PV-RCNN++	~8 hours	ReSimAD	40.73 / 23.72	ReSimAD_ckpt

Visualization Tools for 3DTrans

Our 3DTrans supports the sequence-level visualization function Quick Sequence Demo to continuously display the prediction results of ground truth of a selected scene.

Visualization Demo

Acknowledge

Our code is heavily based on OpenPCDet v0.5.2. Thanks OpenPCDet Development Team for their awesome codebase.
Our pre-training 3D point cloud task is based on ONCE Dataset. Thanks ONCE Development Team for their inspiring data release.

Technical Papers

@inproceedings{zhang2023uni3d,
  title={Uni3D: A Unified Baseline for Multi-dataset 3D Object Detection},
  author={Zhang, Bo and Yuan, Jiakang and Shi, Botian and Chen, Tao and Li, Yikang and Qiao, Yu},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={9253--9262},
  year={2023}
}

@inproceedings{yuan2023bi3d,
  title={Bi3D: Bi-domain Active Learning for Cross-domain 3D Object Detection},
  author={Yuan, Jiakang and Zhang, Bo and Yan, Xiangchao and Chen, Tao and Shi, Botian and Li, Yikang and Qiao, Yu},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={15599--15608},
  year={2023}
}

@inproceedings{yuan2023AD-PT,
  title={AD-PT: Autonomous Driving Pre-Training with Large-scale Point Cloud Dataset},
  author={Yuan, Jiakang and Zhang, Bo and Yan, Xiangchao and Chen, Tao and Shi, Botian and Li, Yikang and Qiao, Yu},
  booktitle={Advances in Neural Information Processing Systems},
  year={2023}
}

@article{zhang2023resimad,
  title={ReSimAD: Zero-Shot 3D Domain Transfer for Autonomous Driving with Source Reconstruction and Target Simulation},
  author={Zhang, Bo and Cai, Xinyu and Yuan, Jiakang and Yang, Donglin and Guo, Jianfei and Xia, Renqiu and Shi, Botian and Dou, Min and Chen, Tao and Liu, Si and others},
  journal={arXiv preprint arXiv:2309.05527},
  year={2023}
}

@article{yan2023spot,
  title={SPOT: Scalable 3D Pre-training via Occupancy Prediction for Autonomous Driving},
  author={Yan, Xiangchao and Chen, Runjian and Zhang, Bo and Yuan, Jiakang and Cai, Xinyu and Shi, Botian and Shao, Wenqi and Yan, Junchi and Luo, Ping and Qiao, Yu},
  journal={arXiv preprint arXiv:2309.10527},
  year={2023}
}

@inproceedings{huang2023sug,
  title={SUG: Single-dataset Unified Generalization for 3D Point Cloud Classification},
  author={Huang, Siyuan and Zhang, Bo and Shi, Botian and Gao, Peng and Li, Yikang and Li, Hongsheng},
  booktitle={Proceedings of the 31th ACM International Conference on Multimedia},
  year={2023}
}

arnoldfychen / 3DTrans