xiaoyeye1117 / Awesome-Vision-Transformer-Collection

Variants of Vision Transformer and its downstream tasks

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Awesome Vision Transformer Collection

Variants of Vision Transformer and Vision Transformer for Downstream Tasks

author: Runwei Guan

affiliation: University of Liverpool / Xi'an Jiaotong-Liverpool University

email: thinkerai@foxmail.com

Backbone

Model Compression

  • A Unified Pruning Framework for Vision Transformers paper

Transfer Learning

  • Pre-Trained Image Processing Transformer paper code
  • UP-DETR: Unsupervised Pre-training for Object Detection with Transformers paper code
  • BEVT: BERT Pretraining of Video Transformers paper
  • Towards a Unified Foundation Model: Jointly Pre-Training Transformers on Unpaired Images and Text paper

Multi-Modal

  • Multi-Modal Fusion Transformer for End-to-End Autonomous Driving paper
  • Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval paper
  • LAVT: Language-Aware Vision Transformer for Referring Image Segmentation paper
  • MTFNet: Mutual-Transformer Fusion Network for RGB-D Salient Object Detection paper
  • Visual-Semantic Transformer for Scene Text Recognition paper
  • Towards a Unified Foundation Model: Jointly Pre-Training Transformers on Unpaired Images and Text paper

Detection

  • YOLOS: You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection paper code
  • WB-DETR: Transformer-Based Detector without Backbone paper
  • UP-DETR: Unsupervised Pre-training for Object Detection with Transformers paper
  • TSP: Rethinking Transformer-based Set Prediction for Object Detection paper
  • DETR paper code
  • Rethinking Transformer-Based Set Prediction for Object Detection paper
  • End-to-End Object Detection with Adaptive Clustering Transformer paper
  • An End-to-End Transformer Model for 3D Object Detection paper
  • End-to-End Human Object Interaction Detection with HOI Transformer paper code
  • Adaptive Image Transformer for One-Shot Object Detection paper
  • Improving 3D Object Detection With Channel-Wise Transformer paper
  • TransPose: Keypoint Localization via Transformer paper
  • Voxel Transformer for 3D Object Detection paper
  • Embracing Single Stride 3D Object Detector with Sparse Transformer paper
  • OW-DETR: Open-world Detection Transformer paper

Segmentation

  • MaX-DeepLab: End-to-End Panoptic Segmentation With Mask Transformers paper code
  • Line Segment Detection Using Transformers without Edges paper
  • VisTR: End-to-End Video Instance Segmentation with Transformers paper code
  • SETR: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers paper code
  • Segmenter: Transformer for Semantic Segmentation paper
  • Fully Transformer Networks for Semantic ImageSegmentation paper
  • SOTR: Segmenting Objects with Transformers paper code
  • GETAM: Gradient-weighted Element-wise Transformer Attention Map for Weakly-supervised Semantic segmentation paper
  • Masked-attention Mask Transformer for Universal Image Segmentation paper

Pose Estimation

  • Hand-Transformer: Non-Autoregressive Structured Modeling for 3D Hand Pose Estimation paper
  • HOT-Net: Non-Autoregressive Transformer for 3D Hand-Object Pose Estimation paper
  • End-to-End Human Pose and Mesh Reconstruction with Transformers paper code
  • PE-former: Pose Estimation Transformer paper
  • Pose Recognition with Cascade Transformers paper code
  • Pose-guided Feature Disentangling for Occluded Person Re-identification Based on Transformer code
  • Geometry-Contrastive Transformer for Generalized 3D Pose Transfer paper
  • Temporal Transformer Networks with Self-Supervision for Action Recognition paper
  • Co-training Transformer with Videos and Images Improves Action Recognition paper

Tracking

  • Transformer Tracking paper code
  • Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking paper code
  • MOTR: End-to-End Multiple-Object Tracking with TRansformer paper code
  • SwinTrack: A Simple and Strong Baseline for Transformer Tracking paper
  • Pedestrian Trajectory Prediction via Spatial Interaction Transformer Network paper
  • PTTR: Relational 3D Point Cloud Object Tracking with Transformer paper

Generative Model

  • 3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds paper
  • Spatial-Temporal Transformer for Dynamic Scene Graph Generation paper
  • THUNDR: Transformer-Based 3D Human Reconstruction With Markers paper
  • DoodleFormer: Creative Sketch Drawing with Transformers paper
  • Image Transformer paper
  • Taming Transformers for High-Resolution Image Synthesis paper code
  • TransGAN: Two Pure Transformers Can Make One Strong GAN, and That Can Scale Up code
  • U2-Former: A Nested U-shaped Transformer for Image Restoration paper

Self-Supervised Learning

  • Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning paper code
  • iGPT paper code
  • An Empirical Study of Training Self-Supervised Vision Transformers paper code
  • Self-supervised Video Transformer paper
  • TransMEF: A Transformer-Based Multi-Exposure Image Fusion Framework using Self-Supervised Multi-Task Learning paper

Explainable

  • Development and testing of an image transformer for explainable autonomous driving systems paper
  • Transformer Interpretability Beyond Attention Visualization paper code

Calibration

  • CTRL-C: Camera Calibration TRansformer With Line-Classification paper code

AI Medicine

  • Semi-Supervised Medical Image Segmentation via Cross Teaching between CNN and Transformer paper
  • 3D Medical Point Transformer: Introducing Convolution to Attention Networks for Medical Point Cloud Analysis paper
  • Hformer: Pre-training and Fine-tuning Transformers for fMRI Prediction Tasks paper
  • MT-TransUNet: Mediating Multi-Task Tokens in Transformers for Skin Lesion Segmentation and Classification paper

About

Variants of Vision Transformer and its downstream tasks