InstaGen: Enhancing Object Detection by Training on Synthetic Dataset (CVPR 2024)

Introduction

In this paper, we present a novel paradigm to enhance the ability of object detector, e.g., expanding categories or improving detection performance, by training on synthetic dataset generated from diffusion models. Specifically, we integrate an instance-level grounding head into a pre-trained, generative diffusion model, to augment it with the ability of localising instances in the generated images. The grounding head is trained to align the text embedding of category names with the regional visual feature of the diffusion model, using supervision from an off-the-shelf object detector, and a novel self-training scheme on (novel) categories not covered by the detector. We conduct thorough experiments to show that, this enhanced version of diffusion model, termed as InstaGen, can serve as a data synthesizer, to enhance object detectors by training on its generated samples, demonstrating superior performance over existing state-of-the-art methods in open-vocabulary (+4.5 AP) and data-sparse (+1.2 ∼ 5.2 AP) scenarios.

Methodology

updates

April 7, 2024: and the code and models for open-vocabulary COCO benchmark
February 8, 2024: initial release

Synthetic Dataset

Prerequisites

# Step 1. Create a conda environment and activate it
conda create --name instagen python=3.8 -y
conda activate instagen

# Step 2. Install the requirements for SDM fine-tuning
cd InstaGen/
pip install -r requirements.txt

# Step 3. Install mmdetection
cd mmdetection/
pip install -U openmim
mim install mmengine
mim install "mmcv>=2.0.0"
pip install -v -e .

Demo

We present a simple demo of utilizing InstaGen to generate images and the bounding boxes with the original model weights of SDM.

Download the original model weights of SDM, listed as sd-v1-4.ckpt, to 'checkpoints/stable-diffusion-v-1-4-original/'
Download the model weights of InstaGen, listed as instagen-4scale_fd_8xb2-12e_coco.pth, to 'mmdetection/checkpoints/'
Run demo:

sh instagen_scripts/demo_instagen.sh

Fine-tune SDM

Download the COCO dataset to 'mmdetection/data/coco'
Download the annotation of the base categories to 'mmdetection/data/coco/annotations/'
Download the original model weights of SDM, listed as sd-v1-4-full-ema.ckpt, to 'checkpoints/stable-diffusion-v-1-4-original/'
Fine-tune SDM:

sh instagen_scripts/finetune_sdm.sh

Train InstaGen

Download the model weights of the pre-trained detector to 'mmdetection/checkpoints/'
To enhance training efficiency, we generate images in advance and store the latent representations during the second-to-last denoising step:

sh instagen_scripts/generate_image.sh

Generate annotation files:

sh instagen_scripts/generate_base_ann.sh

Generate class embeddings:

sh instagen_scripts/generate_class_embedding.sh

Train InstaGen:

sh instagen_scripts/train_instagen.sh

Train detector

Predict the pseudo-labels of the synthetic images with InstaGen:

sh instagen_scripts/infer_instagen.sh

Generate annotation file for training detector:

sh instagen_scripts/generate_novel_ann.sh

Train detector:

sh instagen_scripts/train_detector.sh

Inference

Detector inference:

sh instagen_scripts/infer_detector.sh

Dataset & Models

For your convenience, we provide the synthetic dataset and the trained models in the open-vocabulary COCO benchmark.

The synthetic dataset can be downloaded here
Models

Model	AP50_all	AP50_base	AP50_novel	Download
Fine-tuned SDM	--	--	--	google
InstaGen	--	--	--	google
Faster RCNN	52.2	55.7	42.4	google

Acknowledgement

Thanks Stable Diffusion, Stable Diffusion Finetuning, Grounded Diffusion and MMDetection team for the wonderful open source project!

Citation

If you find InstaGen useful in your research, please consider citing:

@inproceedings{feng2024instagen,
    title={InstaGen: Enhancing Object Detection by Training on Synthetic Dataset},
    author={Feng, Chengjian and Zhong, Yujie and Jie, Zequn and Xie, Weidi and Ma, Lin},
    booktitle={Proceedings of the IEEE / CVF Computer Vision and Pattern Recognition},
    year={2024}
}

About

InstaGen: Enhancing Object Detection by Training on Synthetic Dataset, CVPR2024

https://fcjian.github.io/InstaGen

MIT License

Languages

Language:Jupyter Notebook 82.4%Language:Python 17.3%Language:Shell 0.3%Language:Dockerfile 0.0%Language:Batchfile 0.0%Language:Makefile 0.0%Language:CSS 0.0%