PTSEFormer

This repo is an official implementation of PTSEFormer, which is accepted by ECCV 2022.

Citing PTSEFormer

Please consider citing our paper if you find it useful:

@article{ptseformer,
    Author = {Han, Wang and Jun, Tang and Xiaodong, Liu and Shanyan, Guan and Rong, Xie and Li, Song},
    Title = {PTSEFormer: Progressive Temporal-Spatial Enhanced TransFormer Towards Video Object Detection},
    Conference = {ECCV},
    Year = {2022}
}

Installation

Requirements

Linux, CUDA>=9.2, GCC>=5.4

Python>=3.7

conda create -n PTSEFormer python=3.7 pip

Then, activate the environment:

conda activate PTSEFormer

PyTorch>=1.5.1, torchvision>=0.6.1

conda install pytorch=1.5.1 torchvision=0.6.1 cudatoolkit=9.2 -c pytorch

Other requirements
```
pip install -r requirements.txt
```

Data preparation

Please download the ILSVRC2015 DET and ILSVRC2015 VID dataset from here and organize them as following.

data_root/
	└──ILSVRC2015/
		├──ImageSets/
		├──Annotations/
		├──Data/

Evaluation

The inference command line for testing on the validation dataset:

python -m torch.distributed.launch --nproc_per_node=8 tools/test.py --config-file experiments/PTSEFormer_r101_8gpus.yaml

Pretrained model can be found here.

Training

The training command line for training on a combined dataset of VID and DET:

python -m torch.distributed.launch --nproc_per_node=8 tools/train.py --config-file experiments/PTSEFormer_r101_8gpus.yaml

About

[ECCV 2022] PTSEFormer: Progressive Temporal-Spatial Enhanced TransFormer Towards Video Object Detection

https://arxiv.org/abs/2209.02242

MIT License

Languages

Language:Python 82.7%Language:Cuda 15.6%Language:C++ 1.5%Language:Shell 0.1%