arnoldfychen / 3DTrans

An Open-source Codebase for exploring Continuous-learning/Pre-training-oriented Autonomous Driving Task

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

arXiv arXiv arXiv arXiv arXiv GitHub issues PRs Welcome

3DTrans: An Open-source Codebase for Continuous Learning towards Autonomous Driving Task

3DTrans includes Transfer Learning Techniques and Scalable Pre-training Techniques for tackling the continuous learning issue on autonomous driving as follows.

  1. We implement the Transfer Learning Techniques consisting of four functions:
  • Unsupervised Domain Adaptation (UDA) for 3D Point Clouds
  • Active Domain Adaptation (ADA) for 3D Point Clouds
  • Semi-Supervised Domain Adaptation (SSDA) for 3D Point Clouds
  • Multi-dateset Domain Fusion (MDF) for 3D Point Clouds
  1. We implement the Scalable Pre-training which can continuously enhance the model performance for the downstream tasks, as more pre-training data are fed into our pre-training network:

Team Home:

  • A Team Home for Member Information and Profile, Project Link

Overview

News 🔥

  • SPOT shows that occupancy prediction is a promising pre-training method for general and scalable 3D representation learning, and see Figure 1 of SPOT paper for the inspiring experiment results (updated on Sep. 2023).
  • We have released the Reconstruction-Simulation Dataset obtained using the ReSimAD method (updated on Sep. 2023).
  • We will release all codes of AD-PT here, see AD-PT for all details (updated on Sep. 2023).
  • We have released the AD-PT pre-trained checkpoints, see AD-PT pre-trained checkpoints for pre-trained checkpoints (updated on Aug. 2023).
  • Based on 3DTrans, we achieved significant performance gains on a series of downstream perception benchmarks including Waymo, nuScenes, and KITTI, under different baseline models like PV-RCNN++, SECOND, CenterPoint, PV-RCNN (updated on Jun. 2023).
  • Our 3DTrans supported the Semi-Supervised Domain Adaptation (SSDA) for 3D Object Detection (updated on Nov. 2022).
  • Our 3DTrans supported the Active Domain Adaptation (ADA) of 3D Object Detection for achieving a good trade-off between high performance and annotation cost (updated on Oct. 2022).
  • Our 3DTrans supported several typical transfer learning techniques (such as TQS, CLUE, SN, ST3D, Pseudo-labeling, SESS, and Mean-Teacher) for autonomous driving-related model adaptation and transfer.
  • Our 3DTrans supported the Multi-domain Dataset Fusion (MDF) of 3D Object Detection for enabling the existing 3D models to effectively learn from multiple off-the-shelf 3D datasets (updated on Sep. 2022).
  • Our 3DTrans supported the Unsupervised Domain Adaptation (UDA) of 3D Object Detection for deploying a well-trained source model to an unlabeled target domain (updated on July 2022).
  • We calculate the distribution of the object-size for each public AD dataset in object-size statistics

We expect this repository will inspire the research of 3D model generalization since it will push the limits of perceptual performance. 🗼

Installation for 3DTrans

You may refer to INSTALL.md for the installation of 3DTrans.

Getting Started

Getting Started for ALL Settings
  • Please refer to Readme for Datasets to prepare the dataset and convert the data into the 3DTrans format. Besides, 3DTrans supports the reading and writing data from Ceph Petrel-OSS, please refer to Readme for Datasets for more details.

  • Please refer to Readme for UDA for understanding the problem definition of UDA and performing the UDA adaptation process.

  • Please refer to Readme for ADA for understanding the problem definition of ADA and performing the ADA adaptation process.

  • Please refer to Readme for SSDA for understanding the problem definition of SSDA and performing the SSDA adaptation process.

  • Please refer to Readme for MDF for understanding the problem definition of MDF and performing the MDF joint-training process.

  • Please refer to Readme for ReSimAD for ReSimAD implementation.

  • Please refer to Readme for Scalable Pre-training for starting the journey of 3D perception model pre-training.

Model Zoo

We could not provide the Waymo-related pretrained models due to Waymo Dataset License Agreement, but you could easily achieve similar performance by training with the corresponding configs.

Domain Transfer Results

UDA Results

Here, we report the cross-dataset (Waymo-to-KITTI) adaptation results using the BEV/3D AP performance as the evaluation metric. Please refer to Readme for UDA for experimental results of more cross-domain settings.

  • All LiDAR-based models are trained with 4 NVIDIA A100 GPUs and are available for download.
  • For Waymo dataset training, we train the model using 20% data.
  • The domain adaptation time is measured with 4 NVIDIA A100 GPUs and PyTorch 1.8.1.
  • Pre-SN represents that we perform the SN (statistical normalization) operation during the pre-training source-only model stage.
  • Post-SN represents that we perform the SN (statistical normalization) operation during the adaptation stage.
training time Adaptation Car@R40 download
PointPillar ~7.1 hours Source-only with SN 74.98 / 49.31 -
PointPillar ~0.6 hours Pre-SN 81.71 / 57.11 model-57M
PV-RCNN ~23 hours Source-only with SN 69.92 / 60.17 -
PV-RCNN ~23 hours Source-only 74.42 / 40.35 -
PV-RCNN ~3.5 hours Pre-SN 84.00 / 74.57 model-156M
PV-RCNN ~1 hours Post-SN 84.94 / 75.20 model-156M
Voxel R-CNN ~16 hours Source-only with SN 75.83 / 55.50 -
Voxel R-CNN ~16 hours Source-only 64.88 / 19.90 -
Voxel R-CNN ~2.5 hours Pre-SN 82.56 / 67.32 model-201M
Voxel R-CNN ~2.2 hours Post-SN 85.44 / 76.78 model-201M
PV-RCNN++ ~20 hours Source-only with SN 67.22 / 56.50 -
PV-RCNN++ ~20 hours Source-only 67.68 / 20.82 -
PV-RCNN++ ~2.2 hours Post-SN 86.86 / 79.86 model-193M
ADA Results

Here, we report the Waymo-to-KITTI adaptation results using the BEV/3D AP performance. Please refer to Readme for ADA for experimental results of more cross-domain settings.

  • All LiDAR-based models are trained with 4 NVIDIA A100 GPUs and are available for download.
  • For Waymo dataset training, we train the model using 20% data.
  • The domain adaptation time is measured with 4 NVIDIA A100 GPUs and PyTorch 1.8.1.
training time Adaptation Car@R40 download
PV-RCNN ~23h@4 A100 Source Only 67.95 / 27.65 -
PV-RCNN ~1.5h@2 A100 Bi3D (1% annotation budget) 87.12 / 78.03 Model-58M
PV-RCNN ~10h@2 A100 Bi3D (5% annotation budget) 89.53 / 81.32 Model-58M
PV-RCNN ~1.5h@2 A100 TQS 82.00 / 72.04 Model-58M
PV-RCNN ~1.5h@2 A100 CLUE 82.13 / 73.14 Model-50M
PV-RCNN ~10h@2 A100 Bi3D+ST3D 87.83 / 81.23 Model-58M
Voxel R-CNN ~16h@4 A100 Source Only 64.87 / 19.90 -
Voxel R-CNN ~1.5h@2 A100 Bi3D (1% annotation budget) 88.09 / 79.14 Model-72M
Voxel R-CNN ~6h@2 A100 Bi3D (5% annotation budget) 90.18 / 81.34 Model-72M
Voxel R-CNN ~1.5h@2 A100 TQS 78.26 / 67.11 Model-72M
Voxel R-CNN ~1.5h@2 A100 CLUE 81.93 / 70.89 Model-72M
SSDA Results

We report the target domain results on Waymo-to-nuScenes adaptation using the BEV/3D AP performance as the evaluation metric, and Waymo-to-ONCE adaptation using ONCE evaluation metric. Please refer to Readme for SSDA for experimental results of more cross-domain settings.

  • The domain adaptation time is measured with 4 NVIDIA A100 GPUs and PyTorch 1.8.1.
  • For Waymo dataset training, we train the model using 20% data.
  • second_5%_FT denotes that we use 5% nuScenes training data to fine-tune the Second model.
  • second_5%_SESS denotes that we utilize the SESS: Self-Ensembling Semi-Supervised method to adapt our baseline model.
  • second_5%_PS denotes that we fine-tune the source-only model to nuScenes datasets using 5% labeled data, and perform the pseudo-labeling process on the remaining 95% unlabeled nuScenes data.
training time Adaptation Car@R40 download
Second ~11 hours source-only(Waymo) 27.85 / 16.43 -
Second ~0.4 hours second_5%_FT 45.95 / 26.98 model-61M
Second ~1.8 hours second_5%_SESS 47.77 / 28.74 model-61M
Second ~1.7 hours second_5%_PS 47.72 / 29.37 model-61M
PV-RCNN ~24 hours source-only(Waymo) 40.31 / 23.32 -
PV-RCNN ~1.0 hours pvrcnn_5%_FT 49.58 / 34.86 model-150M
PV-RCNN ~5.5 hours pvrcnn_5%_SESS 49.92 / 35.28 model-150M
PV-RCNN ~5.4 hours pvrcnn_5%_PS 49.84 / 35.07 model-150M
PV-RCNN++ ~16 hours source-only(Waymo) 31.96 / 19.81 -
PV-RCNN++ ~1.2 hours pvplus_5%_FT 49.94 / 34.28 model-185M
PV-RCNN++ ~4.2 hours pvplus_5%_SESS 51.14 / 35.25 model-185M
PV-RCNN++ ~3.6 hours pvplus_5%_PS 50.84 / 35.39 model-185M
  • For Waymo-to-ONCE adaptation, we employ 8 NVIDIA A100 GPUs for model training.
  • PS denotes that we pseudo-label the unlabeled ONCE and re-train the model on pseudo-labeled data.
  • SESS denotes that we utilize the SESS method to adapt the baseline.
  • For ONCE, the IoU thresholds for evaluation are 0.7, 0.3, 0.5 for Vehicle, Pedestrian, Cyclist.
Training ONCE Data Methods Vehicle@AP Pedestrian@AP Cyclist@AP download
Centerpoint Labeled (4K) Train from scracth 74.93 46.21 67.36 model-96M
Centerpoint_Pede Labeled (4K) PS - 49.14 - model-96M
PV-RCNN++ Labeled (4K) Train from scracth 79.78 35.91 63.18 model-188M
PV-RCNN++ Small Dataset (100K) SESS 80.02 46.24 66.41 model-188M
MDF Results

Here, we report the Waymo-and-nuScenes consolidation results. The models are jointly trained on Waymo and nuScenes datasets, and evaluated on Waymo using the mAP/mAHPH LEVEL_2 and nuScenes using the BEV/3D AP. Please refer to Readme for MDF for more results.

  • All LiDAR-based models are trained with 8 NVIDIA A100 GPUs and are available for download.
  • The multi-domain dataset fusion (MDF) training time is measured with 8 NVIDIA A100 GPUs and PyTorch 1.8.1.
  • For Waymo dataset training, we train the model using 20% training data for saving training time.
  • PV-RCNN-nuScenes represents that we train the PV-RCNN model only using nuScenes dataset, and PV-RCNN-DM indicates that we merge the Waymo and nuScenes datasets and train on the merged dataset. Besides, PV-RCNN-DT denotes the domain attention-aware multi-dataset training.
Baseline MDF Methods Waymo@Vehicle Waymo@Pedestrian Waymo@Cyclist nuScenes@Car nuScenes@Pedestrian nuScenes@Cyclist
PV-RCNN-nuScenes only nuScenes 35.59 / 35.21 3.95 / 2.55 0.94 / 0.92 57.78 / 41.10 24.52 / 18.56 10.24 / 8.25
PV-RCNN-Waymo only Waymo 66.49 / 66.01 64.09 / 58.06 62.09 / 61.02 32.99 / 17.55 3.34 / 1.94 0.02 / 0.01
PV-RCNN-DM Direct Merging 57.82 / 57.40 48.24 / 42.81 54.63 / 53.64 48.67 / 30.43 12.66 / 8.12 1.67 / 1.04
PV-RCNN-Uni3D Uni3D 66.98 / 66.50 65.70 / 59.14 61.49 / 60.43 60.77 / 42.66 27.44 / 21.85 13.50 / 11.87
PV-RCNN-DT Domain Attention 67.27 / 66.77 65.86 / 59.38 61.38 / 60.34 60.83 / 43.03 27.46 / 22.06 13.82 / 11.52
Baseline MDF Methods Waymo@Vehicle Waymo@Pedestrian Waymo@Cyclist nuScenes@Car nuScenes@Pedestrian nuScenes@Cyclist
Voxel-RCNN-nuScenes only nuScenes 31.89 / 31.65 3.74 / 2.57 2.41 / 2.37 53.63 / 39.05 22.48 / 17.85 10.86 / 9.70
Voxel-RCNN-Waymo only Waymo 67.05 / 66.41 66.75 / 60.83 63.13 / 62.15 34.10 / 17.31 2.99 / 1.69 0.05 / 0.01
Voxel-RCNN-DM Direct Merging 58.26 / 57.87 52.72 / 47.11 50.26 / 49.50 51.40 / 31.68 15.04 / 9.99 5.40 / 3.87
Voxel-RCNN-Uni3D Uni3D 66.76 / 66.29 66.62 / 60.51 63.36 / 62.42 60.18 / 42.23 30.08 / 24.37 14.60 / 12.32
Voxel-RCNN-DT Domain Attention 66.96 / 66.50 68.23 / 62.00 62.57 / 61.64 60.42 / 42.81 30.49 / 24.92 15.91 / 13.35
Baseline MDF Methods Waymo@Vehicle Waymo@Pedestrian Waymo@Cyclist nuScenes@Car nuScenes@Pedestrian nuScenes@Cyclist
PV-RCNN++ DM Direct Merging 63.79 / 63.38 55.03 / 49.75 59.88 / 58.99 50.91 / 31.46 17.07 / 12.15 3.10 / 2.20
PV-RCNN++-Uni3D Uni3D 68.55 / 68.08 69.83 / 63.60 64.90 / 63.91 62.51 / 44.16 33.82 / 27.18 22.48 / 19.30
PV-RCNN++-DT Domain Attention 68.51 / 68.05 69.81 / 63.58 64.39 / 63.43 62.33 / 44.16 33.44 / 26.94 21.64 / 18.52

3D Pre-training Results

AD-PT Results on Waymo

AD-PT demonstrates strong generalization learning ability on 3D points. We first pre-train the 3D backbone and 2D backbone using the AD-PT on ONCE dataset (from 100K to 1M data), and fine-tune the model on different datasets. Here, we report the results of fine-tuning on Waymo.

Data amount Overall Vehicle Pedestrian Cyclist
SECOND (From scratch) 3% 52.00 / 37.70 58.11 / 57.44 51.34 / 27.38 46.57 / 28.28
SECOND (AD-PT) 3% 55.41 / 51.78 60.53 / 59.93 54.91 / 45.78 50.79 / 49.65
SECOND (From scratch) 20% 60.62 / 56.86 64.26 / 63.73 59.72 / 50.38 57.87 / 56.48
SECOND (AD-PT) 20% 61.26 / 57.69 64.54 / 64.00 60.25 / 51.21 59.00 / 57.86
CenterPoint (From scratch) 3% 59.00 / 56.29 57.12 / 56.57 58.66 / 52.44 61.24 / 59.89
CenterPoint (AD-PT) 3% 61.21 / 58.46 60.35 / 59.79 60.57 / 54.02 62.73 / 61.57
CenterPoint (From scratch) 20% 66.47 / 64.01 64.91 / 64.42 66.03 / 60.34 68.49 / 67.28
CenterPoint (AD-PT) 20% 67.17 / 64.65 65.33 / 64.83 67.16 / 61.20 69.39 / 68.25
PV-RCNN++ (From scratch) 3% 63.81 / 61.10 64.42 / 63.93 64.33 / 57.79 62.69 / 61.59
PV-RCNN++ (AD-PT) 3% 68.33 / 65.69 68.17 / 67.70 68.82 / 62.39 68.00 / 67.00
PV-RCNN++ (From scratch) 20% 69.97 / 67.58 69.18 / 68.75 70.88 / 65.21 69.84 / 68.77
PV-RCNN++ (AD-PT) 20% 71.55 / 69.23 70.62 / 70.19 72.36 / 66.82 71.69 / 70.70

ReSimAD

ReSimAD Implementation

Here, we give the Download Link of our reconstruction-simulation dataset by the ReSimAD, consisting of nuScenes-like, KITTI-like, ONCE-like, and Waymo-like datasets that generate target-domain-like simulation points.

Specifically, please refer to ReSimAD reconstruction for the point-based reconstruction meshes, and PCSim for the technical details of simulating the target-domain-like points based on the reconstructed meshes. For perception module, please refer to PV-RCNN and PV-RCNN++ for model training and evaluation.

We report the zero-shot cross-dataset (Waymo-to-nuScenes) adaptation results using the BEV/3D AP performance as the evaluation metric for a fair comparison. Please refer to ReSimAD for more details.

Methods training time Adaptation Car@R40 Ckpt
PV-RCNN ~23 hours Source-only 31.02 / 17.75 Not Avaliable (Waymo License)
PV-RCNN ~8 hours ST3D 36.42 / 22.99 -
PV-RCNN ~8 hours ReSimAD 37.85 / 21.33 ReSimAD_ckpt
PV-RCNN++ ~20 hours Source-only 29.93 / 18.77 Not Avaliable (Waymo License)
PV-RCNN++ ~2.2 hours ST3D 34.68 / 17.17 -
PV-RCNN++ ~8 hours ReSimAD 40.73 / 23.72 ReSimAD_ckpt

Visualization Tools for 3DTrans

  • Our 3DTrans supports the sequence-level visualization function Quick Sequence Demo to continuously display the prediction results of ground truth of a selected scene.
Visualization Demo

Acknowledge

  • Our code is heavily based on OpenPCDet v0.5.2. Thanks OpenPCDet Development Team for their awesome codebase.

  • Our pre-training 3D point cloud task is based on ONCE Dataset. Thanks ONCE Development Team for their inspiring data release.

Technical Papers

@inproceedings{zhang2023uni3d,
  title={Uni3D: A Unified Baseline for Multi-dataset 3D Object Detection},
  author={Zhang, Bo and Yuan, Jiakang and Shi, Botian and Chen, Tao and Li, Yikang and Qiao, Yu},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={9253--9262},
  year={2023}
}
@inproceedings{yuan2023bi3d,
  title={Bi3D: Bi-domain Active Learning for Cross-domain 3D Object Detection},
  author={Yuan, Jiakang and Zhang, Bo and Yan, Xiangchao and Chen, Tao and Shi, Botian and Li, Yikang and Qiao, Yu},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={15599--15608},
  year={2023}
}
@inproceedings{yuan2023AD-PT,
  title={AD-PT: Autonomous Driving Pre-Training with Large-scale Point Cloud Dataset},
  author={Yuan, Jiakang and Zhang, Bo and Yan, Xiangchao and Chen, Tao and Shi, Botian and Li, Yikang and Qiao, Yu},
  booktitle={Advances in Neural Information Processing Systems},
  year={2023}
}
@article{zhang2023resimad,
  title={ReSimAD: Zero-Shot 3D Domain Transfer for Autonomous Driving with Source Reconstruction and Target Simulation},
  author={Zhang, Bo and Cai, Xinyu and Yuan, Jiakang and Yang, Donglin and Guo, Jianfei and Xia, Renqiu and Shi, Botian and Dou, Min and Chen, Tao and Liu, Si and others},
  journal={arXiv preprint arXiv:2309.05527},
  year={2023}
}
@article{yan2023spot,
  title={SPOT: Scalable 3D Pre-training via Occupancy Prediction for Autonomous Driving},
  author={Yan, Xiangchao and Chen, Runjian and Zhang, Bo and Yuan, Jiakang and Cai, Xinyu and Shi, Botian and Shao, Wenqi and Yan, Junchi and Luo, Ping and Qiao, Yu},
  journal={arXiv preprint arXiv:2309.10527},
  year={2023}
}
@inproceedings{huang2023sug,
  title={SUG: Single-dataset Unified Generalization for 3D Point Cloud Classification},
  author={Huang, Siyuan and Zhang, Bo and Shi, Botian and Gao, Peng and Li, Yikang and Li, Hongsheng},
  booktitle={Proceedings of the 31th ACM International Conference on Multimedia},
  year={2023}
}

About

An Open-source Codebase for exploring Continuous-learning/Pre-training-oriented Autonomous Driving Task

License:Apache License 2.0


Languages

Language:Python 92.8%Language:Cuda 4.0%Language:C++ 2.3%Language:Shell 0.7%Language:C 0.2%Language:Dockerfile 0.1%