whjzsy / Transformer_Tracking

This repository is a paper digest of Transformer-alike approaches in video tracking tasks.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Transformer Tracking

This repository is a paper digest of Transformer-alike approaches in video tracking tasks. Currently, tasks in this repository include Single Object Tracking (SOT), Video Object Segmentation (VOS), Multiple Object Tracking (MOT), Object Re-Identification (ReID), Video Instance Segmentation (VIS) and Video Object Detection (VOD). Note that some trackers with a non-local attention mechanism are also collected.

đź”–Single Object Tracking (SOT)

ICPR 2020:tada:

  • VTT (VTT: Long-term Visual Tracking with Transformers) [paper]

CVPR 2021:tada:

  • SiamGAT (Graph Attention Tracking) [paper]
  • STMTrack (STMTrack: Template-free Visual Tracking with Space-time Memory Networks) [paper]
  • TransT (Transformer Tracking) [paper]
  • TMT (Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking) [paper]

ICCV 2021:tada:

  • SAMN (Learning Spatio-Appearance Memory Network for High-Performance Visual Tracking) [paper]
  • HiFT (HiFT: Hierarchical Feature Transformer for Aerial Tracking) [paper]
  • DTT (High-Performance Discriminative Tracking With Transformers) [paper]
  • STARK (Learning Spatio-Temporal Transformer for Visual Tracking) [paper]
  • DualTFR (Learning Tracking Representations via Dual-Branch Fully Transformer Networks) [paper]

CoRR 2021:tada:

  • TREG (Target Transformed Regression for Accurate Tracking) [paper]
  • TrTr (TrTr: Visual Tracking with Transformer) [paper]
  • E.T.Track (Efficient Visual Tracking with Exemplar Transformers) [paper]
  • SwinTrack (SwinTrack: A Simple and Strong Baseline for Transformer Tracking) [paper]

WACV 2022:tada:

  • SiamTPN (Siamese Transformer Pyramid Networks for Real-Time UAV Tracking) [paper]

CoRR 2022:tada:

  • InMo (Learning Target-aware Representation for Visual Tracking via Informative Interactions) [paper]

đź”–Video Object Segmentation (VOS)

ICCV 2019:tada:

  • STM (Video Object Segmentation using Space-Time Memory Networks) [paper]

ECCV 2020:tada:

  • KMN (Kernelized Memory Network for Video Object Segmentation) [paper]
  • GCM (Fast Video Object Segmentation using the Global Context Module) [paper]
  • GraphMemVOS (Video Object Segmentation with Episodic Graph Memory Networks) [paper]

NeurIPS 2020:tada:

  • AFB-URR (Video Object Segmentation with Adaptive Feature Bank and Uncertain-Region Refinement) [paper]

AAAI 2021:tada:

  • STG-Net (Spatiotemporal Graph Neural Network based Mask Reconstruction for Video Object Segmentation) [paper]

CVPR 2021:tada:

  • LCM (Learning Position and Target Consistency for Memory-based Video Object Segmentation) [paper]
  • RMNet (Efficient Regional Memory Network for Video Object Segmentation) [paper]
  • SwiftNet (SwiftNet: Real-time Video Object Segmentation) [paper]
  • SSTVOS (SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation) [paper]

ICCV 2021:tada:

  • SAMN (Learning Spatio-Appearance Memory Network for High-Performance Visual Tracking) [paper]
  • JOINT (Joint Inductive and Transductive Learning for Video Object Segmentation) [paper]

NeurIPS 2021:tada:

  • AOT (Associating Objects with Transformers for Video Object Segmentation) [paper]
  • STCN (Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation) [paper]

CoRR 2021:tada:

  • TransVOS (TransVOS: Video Object Segmentation with Transformers) [paper]
  • MTTR (End-to-End Referring Video Object Segmentation with Multimodal Transformers) [paper]

WACV 2022:tada:

  • BMVOS (Pixel-Level Bijective Matching for Video Object Segmentation) [paper]

AAAI 2022:tada:

  • SITVOS (Siamese Network with Interactive Transformer for Video Object Segmentation) [paper]

CoRR 2022:tada:

  • ReferFormer (Language as Queries for Referring Video Object Segmentation) [paper]

đź”–Multiple Object Tracking (MOT)

CoRR 2021:tada:

  • RelationTrack (RelationTrack: Relation-aware Multiple Object Tracking with Decoupled Representation) [paper]
  • TransTrack (TransTrack: Multiple Object Tracking with Transformer) [paper]
  • TrackFormer (TrackFormer: Multi-Object Tracking with Transformers) [paper]
  • TransMOT (TransMOT: Spatial-Temporal Graph Transformer for Multiple Object Tracking) [paper]
  • TransCenter (TransCenter: Transformers with Dense Queries for Multiple-Object Tracking) [paper]
  • MOTR (MOTR: End-to-End Multiple-Object Tracking with TRansformer) [paper]
  • MO3TR (Looking Beyond Two Frames: End-to-End Multi-Object Tracking Using Spatial and Temporal Transformers) [paper]

đź”–Object Re-Identification (ReID)

ICCV 2021:tada:

  • TransReID (TransReID: Transformer-based Object Re-Identification) [paper]

MM 2021:tada:

  • HAT (HAT: Hierarchical Aggregation Transformers for Person Re-identification) [paper]

CoRR 2021:tada:

  • TMT (A Video Is Worth Three Views: Trigeminal Transformers for Video-based Person Re-identification) [paper]
  • STT (Spatiotemporal Transformer for Video-based Person Re-identification) [paper]

đź”–Video Instance Segmentation (VIS)

CVPR 2021:tada:

  • VisTR (End-to-End Video Instance Segmentation with Transformers) [paper]

NeurIPS 2021:tada:

  • IFC (Video Instance Segmentation using Inter-Frame Communication Transformers) [paper]

CoRR 2021:tada:

  • QueryTrack (Tracking Instances as Queries) [paper]
  • Mask2Former (Mask2Former for Video Instance Segmentation) [paper]

đź”–Video Object Detection (VOD)

CoRR 2021:tada:

  • TransVOD (End-to-End Video Object Detection with Spatial-Temporal Transformers) [paper]

CoRR 2022:tada:

  • TransVOD++ (TransVOD: End-to-end Video Object Detection with Spatial-Temporal Transformers) [paper]

About

This repository is a paper digest of Transformer-alike approaches in video tracking tasks.