CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching

This is the official PyTorch implementation of CORA (CVPR 2023).

CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching (CVPR 2023)
Xiaoshi Wu, Feng Zhu, Rui Zhao, Hongsheng Li

Overview

We propose CORA, a DETR-style framework for open-vocabulary detection (OVD) that adapts CLIP for Open-vocabulary detection by Region prompting and Anchor pre-matching. Our method demonstrates state-of-the-art results on both COCO and LVIS OVD benchmarks.

Environment

Requirements

Linux with Python ≥ 3.9.12
CUDA 11
The provided environment is suggested for reproducing our results, similar configurations may also work.

Quick Start

# environment
conda create -n cora python=3.9.12
conda activate cora
conda install pytorch==1.12.0 torchvision==0.13.0 cudatoolkit=11.3 -c pytorch

# cora
git clone git@github.com:tgxs002/CORA.git
cd CORA

# other dependencies
pip install -r requirements.txt

# install detectron2
Please install detectron2 as instructed in the official tutorial (https://detectron2.readthedocs.io/en/latest/tutorials/install.html). We use version==0.6 in our experiments.

Data Preparation

Check docs/dataset.md for dataset preparation.

Besides the dataset, we also provide necessary files to reproduce our result. Please download the learned region prompts, and put them under logs folder. A guide for training the region prompts is provided in Region Prompting.

Model Zoo

Method	Pretraining Model	Novel	All	Checkpoint
CORA	RN50	35.1	35.4	Checkpoint
CORA	RN50x4	41.7	43.8	Checkpoint

Checkpoints for LVIS, $\text{CORA}^+$ will be ready soon.

Evaluation

Run the following command for evaluating the RN50 model:

# if you are running locally
bash configs/COCO/R50_dab_ovd_3enc_apm128_splcls0.2_relabel_noinit.sh test 8 local --resume /path/to/checkpoint.pth --eval

# if you are running on a cluster with slurm scheduler
bash configs/COCO/R50_dab_ovd_3enc_apm128_splcls0.2_relabel_noinit.sh test 8 slurm quota_type partition_name --resume /path/to/checkpoint.pth --eval

If you are using slurm, please remember to replace quota_type and partition_name to your quota type and the partition you are using. You can directly change the config and checkpoint path to evaluate other models.

Training Localizer

Before training the localizer, please make sure that the region prompts and relabeled annotations as instructed in Data Preparation.

Run the following command to train the RN50 model:

# if you are running locally
bash configs/COCO/R50_dab_ovd_3enc_apm128_splcls0.2_relabel_noinit.sh RN50 8 local

# if you are running on a cluster with slurm scheduler
bash configs/COCO/R50_dab_ovd_3enc_apm128_splcls0.2_relabel_noinit.sh RN50 8 slurm quota_type partition_name

If you are using slurm, please remember to replace quota_type and partition_name to your quota type and the partition you are using. You can directly change the config to train other models.

Region Prompting

We provide the trained pre-trained region prompts as specified in Data Preparation. Please refer to the region branch for training and exporting the region prompts.

git checkout region

CLIP-Aligned Labeling

The code for CLIP-Aligned Labeling will be released soon in another branch of this repository, we provide the pre-computed relabeled annotations as specified in Data Preparation.

Citation and Acknowledgement

Citation

If you find this repo useful, please consider citing our paper:

@article{wu2023cora,
  title={CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching},
  author={Xiaoshi Wu and Feng Zhu and Rui Zhao and Hongsheng Li},
  journal={ArXiv},
  year={2023},
  volume={abs/2303.13076}
}

Acknowledgement

This repository was built on top of SAM-DETR, CLIP, RegionClip, and DAB-DETR. We thank the effort from the community.

tgxs002 / CORA