sdroh1027 / DiffusionVID

Official Repository of the paper "DiffusionVID: Denoising Object Boxes with Spatio-temporal Conditioning for Video Object Detection"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DiffusionVID for Video Object Detection

PWC

By Si-Dong Roh (sdroh1027@naver.com), Ki-Seok Chung in Hanyang Univ.

This project is an official implementation of "DiffusionVID: Denoising Object Boxes with Spatio-temporal Conditioning for Video Object Detection", IEEE Access, 2023.

Citing DiffusionVID

If our code was helpful, please consider citing our works.

@ARTICLE{diffusionvid,
author={Roh, Si-Dong and Chung, Ki-Seok},
journal={IEEE Access}, 
title={DiffusionVID: Denoising Object Boxes with Spatio-temporal Conditioning for Video Object Detection}, 
year={2023},
doi={10.1109/ACCESS.2023.3328341}}

@ARTICLE{dafa,
author={Roh, Si-Dong and Chung, Ki-Seok},
journal={IEEE Access}, 
title={DAFA: Diversity-Aware Feature Aggregation for Attention-Based Video Object Detection}, 
year={2022},
volume={10},
pages={93453-93463},
doi={10.1109/ACCESS.2022.3203399}}

Main Results

Model Backbone AP50 Link
single frame baseline ResNet-101 76.7 Google
DFF ResNet-101 75.0 Google
FGFA ResNet-101 78.0 Google
RDN-base ResNet-101 81.1 Google
RDN ResNet-101 81.7 Google
MEGA ResNet-101 82.9 Google
DAFA ResNet-101 84.5 Google
DiffusionVID (x1) ResNet-101 86.9 Google
DiffusionVID (x4) ResNet-101 87.1
DiffusionVID (x1) Swin-Base 92.4 Google
DiffusionVID (x4) Swin-Base 92.5

The link of previous models (single frame baseline, DFF, FGFA RDN, MEGA) are from MEGA.

Installation

Please follow INSTALL.md for installation instructions.

Data preparation

Please download ILSVRC2015 DET and ILSVRC2015 VID dataset from here.

Now the official link for VID is expired. Please use this link: here

After that, we recommend to symlink the path to the datasets to datasets/. And the path structure should be as follows:

./datasets/ILSVRC2015/
./datasets/ILSVRC2015/Annotations/DET
./datasets/ILSVRC2015/Annotations/VID
./datasets/ILSVRC2015/Data/DET
./datasets/ILSVRC2015/Data/VID
./datasets/ILSVRC2015/ImageSets

Note: We have already provided a list of all images we use to train and test our model as txt files under directory datasets/ILSVRC2015/ImageSets. You do not need to change them.

Model preparation

In order to test our model, Download .pth files from links in Main Results section. anywhere is OK, but you should adjust MODEL.WEIGHT option of command line.

If you want to train from scratch, download pretrained models. (R101, SwinB)

Your pretrained models must be in here:

./models

Usage

Note: Cache files will be created at the first time you run this project, this may take some time.

Inference

The inference command line for testing on the validation dataset:

# 1gpu inference (R101):
python tools/test_net.py \
    --config-file configs/vid_R_101_DiffusionVID.yaml \
    MODEL.WEIGHT <path_of_your_model.pth> \
    DTYPE float16

# 1gpu inference (SwinB):
python tools/test_net.py \
    --config-file configs/vid_Swin_B_DiffusionVID.yaml \
    MODEL.WEIGHT <path_of_your_model.pth> \
    DTYPE float16

The 4GPU inference command line for testing on the validation dataset:

# 4gpu inference (R101):
python -m torch.distributed.launch \
    --nproc_per_node 4 \
    tools/test_net.py \
    --config-file configs/vid_R_101_DiffusionVID.yaml \
    MODEL.WEIGHT <path_of_your_model.pth> \
    DTYPE float16

Please note that:

  1. If you want to evaluate a different model, please change --config-file and MODEL.WEIGHT.
  2. If you want to evaluate motion-IoU specific AP, simply add --motion-specific.
  3. As testing on above 170000+ frames is toooo time-consuming, so we enable directly testing on generated bounding boxes, which is automatically saved in a file named predictions.pth on your training directory. That means you do not need to run the evaluation from the very start every time. You could access this by running:
    python tools/test_prediction.py \
        --config-file configs/vid_R_101_DiffusionVID.yaml \
        --prediction <path_of_your_model.pth>

Training

The following command line will train DiffusionVID on 4 GPUs with Synchronous Stochastic Gradient Descent (SGD):

python -m torch.distributed.launch \
    --nproc_per_node=4 \
    tools/train_net.py \
    --master_port=$((RANDOM + 10000)) \
    --config-file configs/vid_R_101_DiffusionVID.yaml \
    OUTPUT_DIR training_dir/DiffusionVID_R_101_your_model_name

Please note that:

  1. The models will be saved into OUTPUT_DIR.
  2. If you want to train other methods with other backbones, please change --config-file.

Many of our code engine is from MEGA & DiffusionDet. We thank the authors for making their code publicly available.

About

Official Repository of the paper "DiffusionVID: Denoising Object Boxes with Spatio-temporal Conditioning for Video Object Detection"

License:Apache License 2.0


Languages

Language:Python 86.4%Language:Cuda 10.9%Language:C++ 1.7%Language:C 0.9%Language:Shell 0.0%