PLA: Language-Driven Open-Vocabulary 3D Scene Understanding

Runyu Ding^1*, Jihan Yang^1*, Chuhui Xue², Wenqing Zhang², Song Bai^2†, Xiaojuan Qi^1†,

¹The University of Hong Kong ²ByteDance

*equal contribution ⁺corresponding author

CVPR 2023

TL;DR: PLA leverages powerful VL foundation models to construct hierarchical 3D-text pairs for 3D open-world learning.


working space	piano	vending machine

TODO

Release caption processing code

Getting Started

Installation

Please refer to INSTALL.md for the installation.

Dataset Preparation

Please refer to DATASET.md for dataset preparation.

Training & Inference

Please refer to MODEL.md for training and inference scripts and pretrained models.

Citation

If you find this project useful in your research, please consider cite:

@inproceedings{ding2022language,
    title={PLA: Language-Driven Open-Vocabulary 3D Scene Understanding},
    author={Ding, Runyu and Yang, Jihan and Xue, Chuhui and Zhang, Wenqing and Bai, Song and Qi, Xiaojuan},
    booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
    year={2023}
}

Acknowledgement

Code is partly borrowed from OpenPCDet, PointGroup and SoftGroup.

About

(CVPR 2023) PLA: Language-Driven Open-Vocabulary 3D Scene Understanding

Apache License 2.0

Languages

Language:Python 88.3%Language:C++ 5.7%Language:Cuda 4.3%Language:C 1.5%Language:Shell 0.2%