Transformer Tracking
This repository is a paper digest of Transformer-alike approaches in video tracking tasks. Currently, tasks in this repository include Single Object Tracking (SOT), Video Object Segmentation (VOS), Multiple Object Tracking (MOT), Object Re-Identification (ReID), Video Instance Segmentation (VIS) and Video Object Detection (VOD). Note that some trackers with a non-local attention mechanism are also collected.
đź”–Single Object Tracking (SOT)
ICPR 2020:tada:
- VTT (VTT: Long-term Visual Tracking with Transformers) [paper]
CVPR 2021:tada:
- SiamGAT (Graph Attention Tracking) [paper]
- STMTrack (STMTrack: Template-free Visual Tracking with Space-time Memory Networks) [paper]
- TransT (Transformer Tracking) [paper]
- TMT (Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking) [paper]
ICCV 2021:tada:
- SAMN (Learning Spatio-Appearance Memory Network for High-Performance Visual Tracking) [paper]
- HiFT (HiFT: Hierarchical Feature Transformer for Aerial Tracking) [paper]
- DTT (High-Performance Discriminative Tracking With Transformers) [paper]
- STARK (Learning Spatio-Temporal Transformer for Visual Tracking) [paper]
- DualTFR (Learning Tracking Representations via Dual-Branch Fully Transformer Networks) [paper]
CoRR 2021:tada:
- TREG (Target Transformed Regression for Accurate Tracking) [paper]
- TrTr (TrTr: Visual Tracking with Transformer) [paper]
- E.T.Track (Efficient Visual Tracking with Exemplar Transformers) [paper]
- SwinTrack (SwinTrack: A Simple and Strong Baseline for Transformer Tracking) [paper]
WACV 2022:tada:
- SiamTPN (Siamese Transformer Pyramid Networks for Real-Time UAV Tracking) [paper]
CoRR 2022:tada:
- InMo (Learning Target-aware Representation for Visual Tracking via Informative Interactions) [paper]
đź”–Video Object Segmentation (VOS)
ICCV 2019:tada:
- STM (Video Object Segmentation using Space-Time Memory Networks) [paper]
ECCV 2020:tada:
- KMN (Kernelized Memory Network for Video Object Segmentation) [paper]
- GCM (Fast Video Object Segmentation using the Global Context Module) [paper]
- GraphMemVOS (Video Object Segmentation with Episodic Graph Memory Networks) [paper]
NeurIPS 2020:tada:
- AFB-URR (Video Object Segmentation with Adaptive Feature Bank and Uncertain-Region Refinement) [paper]
AAAI 2021:tada:
- STG-Net (Spatiotemporal Graph Neural Network based Mask Reconstruction for Video Object Segmentation) [paper]
CVPR 2021:tada:
- LCM (Learning Position and Target Consistency for Memory-based Video Object Segmentation) [paper]
- RMNet (Efficient Regional Memory Network for Video Object Segmentation) [paper]
- SwiftNet (SwiftNet: Real-time Video Object Segmentation) [paper]
- SSTVOS (SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation) [paper]
ICCV 2021:tada:
- SAMN (Learning Spatio-Appearance Memory Network for High-Performance Visual Tracking) [paper]
- JOINT (Joint Inductive and Transductive Learning for Video Object Segmentation) [paper]
NeurIPS 2021:tada:
- AOT (Associating Objects with Transformers for Video Object Segmentation) [paper]
- STCN (Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation) [paper]
CoRR 2021:tada:
- TransVOS (TransVOS: Video Object Segmentation with Transformers) [paper]
- MTTR (End-to-End Referring Video Object Segmentation with Multimodal Transformers) [paper]
WACV 2022:tada:
- BMVOS (Pixel-Level Bijective Matching for Video Object Segmentation) [paper]
AAAI 2022:tada:
- SITVOS (Siamese Network with Interactive Transformer for Video Object Segmentation) [paper]
CoRR 2022:tada:
- ReferFormer (Language as Queries for Referring Video Object Segmentation) [paper]
đź”–Multiple Object Tracking (MOT)
CoRR 2021:tada:
- RelationTrack (RelationTrack: Relation-aware Multiple Object Tracking with Decoupled Representation) [paper]
- TransTrack (TransTrack: Multiple Object Tracking with Transformer) [paper]
- TrackFormer (TrackFormer: Multi-Object Tracking with Transformers) [paper]
- TransMOT (TransMOT: Spatial-Temporal Graph Transformer for Multiple Object Tracking) [paper]
- TransCenter (TransCenter: Transformers with Dense Queries for Multiple-Object Tracking) [paper]
- MOTR (MOTR: End-to-End Multiple-Object Tracking with TRansformer) [paper]
- MO3TR (Looking Beyond Two Frames: End-to-End Multi-Object Tracking Using Spatial and Temporal Transformers) [paper]
đź”–Object Re-Identification (ReID)
CVPR 2021:tada:
- PAT (Diverse Part Discovery: Occluded Person Re-identification with Part-Aware Transformer) [paper]
ICCV 2021:tada:
- TransReID (TransReID: Transformer-based Object Re-Identification) [paper]
- APD (Transformer Meets Part Model: Adaptive Part Division for Person Re-Identification) [paper]
MM 2021:tada:
- HAT (HAT: Hierarchical Aggregation Transformers for Person Re-identification) [paper]
CoRR 2021:tada:
- TMT (A Video Is Worth Three Views: Trigeminal Transformers for Video-based Person Re-identification) [paper]
- STT (Spatiotemporal Transformer for Video-based Person Re-identification) [paper]
- AAformer (AAformer: Auto-Aligned Transformer for Person Re-Identification) [paper]
đź”–Video Instance Segmentation (VIS)
CVPR 2021:tada:
- VisTR (End-to-End Video Instance Segmentation with Transformers) [paper]
NeurIPS 2021:tada:
- IFC (Video Instance Segmentation using Inter-Frame Communication Transformers) [paper]
CoRR 2021:tada:
- QueryTrack (Tracking Instances as Queries) [paper]
- Mask2Former (Mask2Former for Video Instance Segmentation) [paper]
AAAI 2022:tada:
- HITF (Hybrid Instance-aware Temporal Fusion for Online Video Instance Segmentation) [paper]
đź”–Video Object Detection (VOD)
CoRR 2021:tada:
- TransVOD (End-to-End Video Object Detection with Spatial-Temporal Transformers) [paper]
CoRR 2022:tada:
- TransVOD++ (TransVOD: End-to-end Video Object Detection with Spatial-Temporal Transformers) [paper]