By Wuyang Li
Domain Adaptive Object Detection (DAOD) strongly assumes a shared class space between the two domains.
This work breaks through the assumption and formulates Adaptive Open-set Object Detection (AOOD), by allowing the target domain with novel-class objects.
The object detector uses the base-class labels in the source domain for training, and aims to detect base-class objects and identify novel-class objects as unknown in the target domain.
If you have any ideas and problems you hope to discuss, you can reach me via E-mail.
git clone https://github.com/CityU-AIM-Group/SOMA.git
(b) Install the project following Deformable DETR
Note that the following is in line with our experimental environments, which is slightly different from the official one.
# Linux, CUDA>=9.2, GCC>=5.4
# (ours) CUDA=10.2, GCC=8.4, NVIDIA V100
# Establish the conda environment
conda create -n aood python=3.7 pip
conda activate aood
conda install pytorch=1.5.1 torchvision=0.6.1 cudatoolkit=10.2 -c pytorch
pip install -r requirements.txt
# Compile the project
cd ./models/ops
sh ./make.sh
# unit test (should see all checking is True)
python test.py
# NOTE: If you meet the permission denied issue when starting the training
cd ../../
chmod -R 777 ./
(Foggy) Cityscapes | Pascal VOC | Clipart | BDD100K (Daytime) | |
---|---|---|---|---|
Official Links | Imgs | Imgs+Labels | - | Imgs |
Our Links | Labels | - | Imgs+Labels | Labels |
(b) Download DINO-pretrained ResNet-50 from this link
[DATASET_PATH]
ββ Cityscapes
ββ AOOD_Annotations
ββ AOOD_Main
ββ train_source.txt
ββ train_target.txt
ββ val_source.txt
ββ val_target.txt
ββ leftImg8bit
ββ train
ββ val
ββ leftImg8bit_foggy
ββ train
ββ val
ββ bdd_daytime
ββ Annotations
ββ ImageSets
ββ JPEGImages
ββ clipart
ββ Annotations
ββ ImageSets
ββ JPEGImages
ββ VOCdevkit
ββ VOC2007
ββ VOC2012
For bdd100k daytime, put all images into bdd_daytime/JPEGImages/*.jpg.
The image settings for other benchmarks are consistent with SIGMA.
Replace the DATASET.COCO_PATH in all yaml files in config by your data root $DATASET_PATH, e.g., https://github.com/CityU-AIM-Group/SOMA/blob/41c11cbcb3589376f956950209d5ae3fbc839792/configs/soma_aood_city_to_foggy_r50.yaml#L22
Replace the backbone loading path: https://github.com/CityU-AIM-Group/SOMA/blob/41c11cbcb3589376f956950209d5ae3fbc839792/models/backbone.py#L107
We use two GPUs for training with 2 source images and 2 target images as input. Please take a look at the generated eval_results.txt file in OUTPUT_DIR, which saves the per-epoch evaluation results in the latex table format.
GPUS_PER_NODE=2
./tools/run_dist_launch.sh 2 python main_multi_eval.py --config_file {CONFIG_FILE} --opts DATASET.AOOD_SETTING 1
We provide some scripts in our experiments in run.sh. After "--opts", the settings will overwrite the default config file as the maskrcnn-benchmark framework.
Will be provided later
- The core idea is to select informative motifs (which can be treated as the mix-up of object queries) for self-training.
- You can try the DA version of OW-DETR in this repository by setting:
-opts AOOD.OW_DETR_ON True
- Adopting SAM to address AOOD may be a good direction.
- To visualize unknown boxes, post-processing is needed in PostProcess.
If you think this work is helpful for your project, please give it a star and citation. We sincerely appreciate your acknowledgment.
@InProceedings{Li_2023_ICCV,
author = {Li, Wuyang and Guo, Xiaoqing and Yuan, Yixuan},
title = {Novel Scenes \& Classes: Towards Adaptive Open-set Object Detection},
booktitle = {ICCV},
year = {2023},
}
Relevant project:
Exploring a similar task for the image classification. [link]
@InProceedings{Li_2023_CVPR,
author = {Li, Wuyang and Liu, Jie and Han, Bo and Yuan, Yixuan},
title = {Adjustment and Alignment for Unbiased Open Set Domain Adaptation},
booktitle = {CVPR},
year = {2023},
}
We greatly appreciate the tremendous effort for the following works.
- This work is based on the DAOD framework AQT.
- Our work is highly inspired by OW-DETR and OpenDet.
- The implementation of the basic detector is based on Deformable DETR.
Domain Adaptive Object Detection (DAOD) transfers an object detector to a novel domain free of labels. However, in the real world, besides encountering novel scenes, novel domains always contain novel-class objects de facto, which are ignored in existing research. Thus, we formulate and study a more practical setting, Adaptive Open-set Object Detection (AOOD), considering both novel scenes and classes. Directly combing off-the-shelled cross-domain and open-set approaches is sub-optimal since their low-order dependence, such as the confidence score, is insufficient for the AOOD with two dimensions of novel information. To address this, we propose a novel Structured Motif Matching (SOMA) framework for AOOD, which models the high-order relation with motifs, i.e., statistically significant subgraphs, and formulates AOOD solution as motif matching to learn with high-order patterns. In a nutshell, SOMA consists of Structure-aware Novel-class Learning (SNL) and Structure-aware Transfer Learning (STL). As for SNL, we establish an instance-oriented graph to capture the class-independent object feature hidden in different base classes. Then, a high-order metric is proposed to match the most significant motif as high-order patterns, serving for motif-guided novel-class learning. In STL, we set up a semantic-oriented graph to model the class-dependent relation across domains, and match unlabelled objects with high-order motifs to align the cross-domain distribution with structural awareness. Extensive experiments demonstrate that the proposed SOMA achieves state-of-the-art performance.