OV-3DET: An Open Vocabulary 3D DETector.
OV-3DET: Open-Vocabulary Point-Cloud Object Detection without 3D Annotation,
Yuheng Lu, Chenfeng Xu, Xiaobao Wei, Xiaodong Xie, Masayoshi Tomizuka, Kurt Keutzer and Shanghang Zhang,
Accepted to CVPR2023
-
Detects 3D objects according to text prompting.
-
The training of OV-3DET does not require 3D annotation.
See installation instructions.
See dataset instructions, or directly download the processed dataset.
Learn to Localize 3D Objects from 2D Pretrained Detector:
bash scripts/scannet_train_loc.sh
Learn to Classify 3D Objects from 2D Pretrained vision-language Model:
bash scripts/scannet_train_dtcc.sh
To evaluate OV-3DET, simply by running:
bash scripts/evaluate.sh
We provide the pretrained model weights for both "Phase 1" and "Phase 2".
Dataset | Phase | Epochs | Model weights |
---|---|---|---|
ScanNet | 1 | 400 | weights |
ScanNet | 2 | 50 | weights |
SUN RGB-D | 1 | 400 | weights |
SUN RGB-D | 2 | 50 | weights |
This codebase is modified base on 3DETR [1], CLIP [2] and Detic [3], we sincerely appreciate their contributions!
[1] An end-to-end transformer model for 3d object detection. ICCV. 2021.
[2] Learning transferable visual models from natural language supervision. ICML. 2021.
[3] Detecting twenty-thousand classes using image-level supervision. ECCV. 2022.
If you find this repository helpful, please consider citing our work:
@article{lu2023open,
title={Open-Vocabulary Point-Cloud Object Detection without 3D Annotation},
author={Lu, Yuheng and Xu, Chenfeng and Wei, Xiaobao and Xie, Xiaodong and Tomizuka, Masayoshi and Keutzer, Kurt and Zhang, Shanghang},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2023}
}