Hon-Wong / PTSEFormer

[ECCV 2022] PTSEFormer: Progressive Temporal-Spatial Enhanced TransFormer Towards Video Object Detection

Home Page:https://arxiv.org/abs/2209.02242

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PTSEFormer

This repo is an official implementation of PTSEFormer, which is accepted by ECCV 2022.

Citing PTSEFormer

Please consider citing our paper if you find it useful:

@article{ptseformer,
    Author = {Han, Wang and Jun, Tang and Xiaodong, Liu and Shanyan, Guan and Rong, Xie and Li, Song},
    Title = {PTSEFormer: Progressive Temporal-Spatial Enhanced TransFormer Towards Video Object Detection},
    Conference = {ECCV},
    Year = {2022}
}

Installation

Requirements

  • Linux, CUDA>=9.2, GCC>=5.4

  • Python>=3.7

    conda create -n PTSEFormer python=3.7 pip
    

    Then, activate the environment:

    conda activate PTSEFormer
    
  • PyTorch>=1.5.1, torchvision>=0.6.1

    conda install pytorch=1.5.1 torchvision=0.6.1 cudatoolkit=9.2 -c pytorch
    
  • Other requirements

    pip install -r requirements.txt
    

Data preparation

Please download the ILSVRC2015 DET and ILSVRC2015 VID dataset from here and organize them as following.

data_root/
	└──ILSVRC2015/
		├──ImageSets/
		├──Annotations/
		├──Data/

Evaluation

The inference command line for testing on the validation dataset:

python -m torch.distributed.launch --nproc_per_node=8 tools/test.py --config-file experiments/PTSEFormer_r101_8gpus.yaml

Pretrained model can be found here.

Training

The training command line for training on a combined dataset of VID and DET:

python -m torch.distributed.launch --nproc_per_node=8 tools/train.py --config-file experiments/PTSEFormer_r101_8gpus.yaml

About

[ECCV 2022] PTSEFormer: Progressive Temporal-Spatial Enhanced TransFormer Towards Video Object Detection

https://arxiv.org/abs/2209.02242

License:MIT License


Languages

Language:Python 82.7%Language:Cuda 15.6%Language:C++ 1.5%Language:Shell 0.1%