Transformer Tracking

This repository is a paper digest of Transformer-alike approaches in video tracking tasks. Currently, tasks in this repository include Single Object Tracking (SOT), Video Object Segmentation (VOS), Multiple Object Tracking (MOT), Object Re-Identification (ReID), Video Instance Segmentation (VIS) and Video Object Detection (VOD). Note that some trackers with a non-local attention mechanism are also collected.

🔖Single Object Tracking (SOT)

ICPR 2020:tada:

VTT (VTT: Long-term Visual Tracking with Transformers) [paper]

CVPR 2021:tada:

SiamGAT (Graph Attention Tracking) [paper]
STMTrack (STMTrack: Template-free Visual Tracking with Space-time Memory Networks) [paper]
TransT (Transformer Tracking) [paper]
TMT (Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking) [paper]

ICCV 2021:tada:

SAMN (Learning Spatio-Appearance Memory Network for High-Performance Visual Tracking) [paper]
HiFT (HiFT: Hierarchical Feature Transformer for Aerial Tracking) [paper]
DTT (High-Performance Discriminative Tracking With Transformers) [paper]
STARK (Learning Spatio-Temporal Transformer for Visual Tracking) [paper]
DualTFR (Learning Tracking Representations via Dual-Branch Fully Transformer Networks) [paper]

CoRR 2021:tada:

TREG (Target Transformed Regression for Accurate Tracking) [paper]
TrTr (TrTr: Visual Tracking with Transformer) [paper]
E.T.Track (Efficient Visual Tracking with Exemplar Transformers) [paper]
SwinTrack (SwinTrack: A Simple and Strong Baseline for Transformer Tracking) [paper]

WACV 2022:tada:

SiamTPN (Siamese Transformer Pyramid Networks for Real-Time UAV Tracking) [paper]

CoRR 2022:tada:

InMo (Learning Target-aware Representation for Visual Tracking via Informative Interactions) [paper]

🔖Video Object Segmentation (VOS)

ICCV 2019:tada:

STM (Video Object Segmentation using Space-Time Memory Networks) [paper]

ECCV 2020:tada:

KMN (Kernelized Memory Network for Video Object Segmentation) [paper]
GCM (Fast Video Object Segmentation using the Global Context Module) [paper]
GraphMemVOS (Video Object Segmentation with Episodic Graph Memory Networks) [paper]

NeurIPS 2020:tada:

AFB-URR (Video Object Segmentation with Adaptive Feature Bank and Uncertain-Region Refinement) [paper]

AAAI 2021:tada:

STG-Net (Spatiotemporal Graph Neural Network based Mask Reconstruction for Video Object Segmentation) [paper]

CVPR 2021:tada:

LCM (Learning Position and Target Consistency for Memory-based Video Object Segmentation) [paper]
RMNet (Efficient Regional Memory Network for Video Object Segmentation) [paper]
SwiftNet (SwiftNet: Real-time Video Object Segmentation) [paper]
SSTVOS (SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation) [paper]

ICCV 2021:tada:

SAMN (Learning Spatio-Appearance Memory Network for High-Performance Visual Tracking) [paper]
JOINT (Joint Inductive and Transductive Learning for Video Object Segmentation) [paper]

NeurIPS 2021:tada:

AOT (Associating Objects with Transformers for Video Object Segmentation) [paper]
STCN (Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation) [paper]

CoRR 2021:tada:

TransVOS (TransVOS: Video Object Segmentation with Transformers) [paper]
MTTR (End-to-End Referring Video Object Segmentation with Multimodal Transformers) [paper]

WACV 2022:tada:

BMVOS (Pixel-Level Bijective Matching for Video Object Segmentation) [paper]

AAAI 2022:tada:

SITVOS (Siamese Network with Interactive Transformer for Video Object Segmentation) [paper]

CoRR 2022:tada:

ReferFormer (Language as Queries for Referring Video Object Segmentation) [paper]

🔖Multiple Object Tracking (MOT)

CoRR 2021:tada:

RelationTrack (RelationTrack: Relation-aware Multiple Object Tracking with Decoupled Representation) [paper]
TransTrack (TransTrack: Multiple Object Tracking with Transformer) [paper]
TrackFormer (TrackFormer: Multi-Object Tracking with Transformers) [paper]
TransMOT (TransMOT: Spatial-Temporal Graph Transformer for Multiple Object Tracking) [paper]
TransCenter (TransCenter: Transformers with Dense Queries for Multiple-Object Tracking) [paper]
MOTR (MOTR: End-to-End Multiple-Object Tracking with TRansformer) [paper]
MO3TR (Looking Beyond Two Frames: End-to-End Multi-Object Tracking Using Spatial and Temporal Transformers) [paper]

🔖Object Re-Identification (ReID)

ICCV 2021:tada:

TransReID (TransReID: Transformer-based Object Re-Identification) [paper]

MM 2021:tada:

HAT (HAT: Hierarchical Aggregation Transformers for Person Re-identification) [paper]

CoRR 2021:tada:

TMT (A Video Is Worth Three Views: Trigeminal Transformers for Video-based Person Re-identification) [paper]
STT (Spatiotemporal Transformer for Video-based Person Re-identification) [paper]

🔖Video Instance Segmentation (VIS)

CVPR 2021:tada:

VisTR (End-to-End Video Instance Segmentation with Transformers) [paper]

NeurIPS 2021:tada:

IFC (Video Instance Segmentation using Inter-Frame Communication Transformers) [paper]

CoRR 2021:tada:

QueryTrack (Tracking Instances as Queries) [paper]
Mask2Former (Mask2Former for Video Instance Segmentation) [paper]

🔖Video Object Detection (VOD)

CoRR 2021:tada:

TransVOD (End-to-End Video Object Detection with Spatial-Temporal Transformers) [paper]

CoRR 2022:tada:

TransVOD++ (TransVOD: End-to-end Video Object Detection with Spatial-Temporal Transformers) [paper]

whjzsy / Transformer_Tracking

Transformer Tracking

🔖Single Object Tracking (SOT)

ICPR 2020:tada:

CVPR 2021:tada:

ICCV 2021:tada:

CoRR 2021:tada:

WACV 2022:tada:

CoRR 2022:tada:

🔖Video Object Segmentation (VOS)

ICCV 2019:tada:

ECCV 2020:tada:

NeurIPS 2020:tada:

AAAI 2021:tada:

CVPR 2021:tada:

ICCV 2021:tada:

NeurIPS 2021:tada:

CoRR 2021:tada:

WACV 2022:tada:

AAAI 2022:tada:

CoRR 2022:tada:

🔖Multiple Object Tracking (MOT)

CoRR 2021:tada:

🔖Object Re-Identification (ReID)

ICCV 2021:tada:

MM 2021:tada:

CoRR 2021:tada:

🔖Video Instance Segmentation (VIS)

CVPR 2021:tada:

NeurIPS 2021:tada:

CoRR 2021:tada:

🔖Video Object Detection (VOD)

CoRR 2021:tada:

CoRR 2022:tada:

About