Important
π Stay up to date at opendrivelab.com!
This is the official repository for the Detect Anything 3D in the Wild, a promptable 3D detection foundation model capable of detecting any novel object under arbitrary camera configurations using only monocular inputs
- π TODO
- π Getting Started
- π¦ Checkpoints
- π Dataset Preparation
- ποΈββοΈ Training
- π Inference
- π Launch Online Demo
- π Citation
- Release full code
- Provide training and inference scripts
- Release the model weights
- TODO: Provide full conversion scripts for constructing DA3D locally
- TODO: Simplify the inference process
- TODO: Provide a tutorial for creating customized datasets and finetuning
conda create -n detany3d python=3.8
conda activate detany3d
β (1) Install Segment Anything (SAM)
Follow the official instructions to install SAM and download its checkpoints.
β (2) Install UniDepth
Follow the UniDepth setup guide to compile and install all necessary packages.
β (3) Clone and configure GroundingDINO
git clone https://github.com/IDEA-Research/GroundingDINO.git
cd GroundingDINO
pip install -e .
π The exact dependency versions are listed in our
requirements.txt
Please download third-party checkpoints from the following sources:
- SAM checkpoint: Please download
sam_vit_h.pthfrom the official SAM GitHub Releases - UniDepth / DINO checkpoints: Available via Google Drive
detany3d_private/
βββ checkpoints/
β βββ sam_ckpts/
β β βββ sam_vit_h.pth
β βββ unidepth_ckpts/
β β βββ unidepth.pth
β βββ dino_ckpts/
β β βββ dino_swin_large.pth
β βββ detany3d_ckpts/
β βββ detany3d.pth
GroundingDINO's checkpoint should be downloaded from its official repo and placed as instructed in their documentation.
The data/ directory should follow the structure below:
data/
βββ DA3D_pkls/ # DA3D processed pickle files
βββ kitti/
β βββ test_depth_front/
β βββ ImageSets/
β βββ training/
β βββ testing/
βββ nuscenes/
| βββ nuscenes_depth/
β βββ samples/
βββ 3RScan/
β βββ <token folders>/ # e.g., 10b17940-3938-...
βββ hypersim/
| βββ depth_in_meter/
β βββ ai_XXX_YYY/ # e.g., ai_055_009
βββ waymo/
β βββ kitti_format/ # KITTI-format data for Waymo
β βββ validation_depth_front/
β βββ ImageSets/
β βββ training/
β βββ testing/
βββ objectron/
β βββ train/
β βββ test/
βββ ARKitScenes/
β βββ Training/
β βββ Validation/
βββ cityscapes3d/
β βββ depth/
β βββ leftImg8bit/
βββ SUNRGBD/
β βββ realsense/
β βββ xtion/
| βββ kv1/
β βββ kv2/
The download for
kitti,nuscenes,hypersim,objectron,arkitscenes, andsunrgbdfollow the Omni3D convention. Please refer to the Omni3D repository for details on how to organize and preprocess these datasets.
ποΈ The
DA3D_pkls(minimal metadata for inference) can be downloaded from Google Drive.
π§© Note: This release currently supports a minimal inference-only version. The conversion scripts of full dataset + all depth-related files will be provided later.
β οΈ Depth files are not required for inference. You can safely setdepth_path = Nonein detany3d_dataset.py to bypass depth loading.
torchrun \
--nproc_per_node=8 \
--master_addr=${MASTER_ADDR} \
--master_port=${MASTER_PORT} \
--nnodes=8 \
--node_rank=${RANK} \
./train.py \
--config_path \
./detect_anything/configs/train.yaml
torchrun \
--nproc_per_node=8 \
--master_addr=${MASTER_ADDR} \
--master_port=${MASTER_PORT} \
--nnodes=1 \
--node_rank=${RANK} \
./train.py \
--config_path \
./detect_anything/configs/inference_indomain_gt_prompt.yaml
After inference, a file named {dataset}_output_results.json will be generated in the exps/<your_exp_dir>/ directory.
β οΈ Due to compatibility issues betweenpytorch3dand the current environment, we recommend copying the output JSON file into the evaluation script of repositories like Omni3D or OVMono3D for standardized metric evaluation.
TODO: Evaluation for zero-shot datasets currently requires manual modification of the Omni3D or OVMono3D repositories and is not yet fully supported here.
We plan to release a merged evaluation script in this repository to make direct evaluation more convenient in the future.
python ./deploy.py
If you find this repository useful, please consider citing:
@article{zhang2025detect,
title={Detect Anything 3D in the Wild},
author={Zhang, Hanxue and Jiang, Haoran and Yao, Qingsong and Sun, Yanan and Zhang, Renrui and Zhao, Hao and Li, Hongyang and Zhu, Hongzi and Yang, Zetong},
journal={arXiv preprint arXiv:2504.07958},
year={2025}
}