Ultimate-Awesome-Transformer-Attention

This repo contains a comprehensive paper list of Vision Transformer & Attention, including papers, codes, and related websites.
This list is maintained by Min-Hung Chen. (Actively keep updating)

If you find some ignored papers, feel free to create pull requests, open issues, or email me.
Contributions in any form to make this list more comprehensive are welcome.

If you find this repository useful, please consider citing and STARing this list.
Feel free to share this list with others!

[Update: July, 2022] Added all the related papers from ICML 2022!
[Update: June, 2022] Added all the related papers from CVPR 2022!

Overview

Survey
Image Classification / Backbone
Detection
Segmentation
Video (High-level)
Multi-Modality
Other High-level Vision Tasks
Transfer / X-Supervised / X-Shot / Continual Learning
Low-level Vision Tasks
Reinforcement Learning
- Navigation
- Other RL Tasks
Medical
Other Tasks
Attention Mechanisms in Vision/NLP
- Attention for Vision
- NLP
- Both
- Others
Citation
References

Survey

"A Survey on Visual Transformer", TPAMI, 2022 (Huawei). [Paper]
"A Comprehensive Study of Vision Transformers on Dense Prediction Tasks", VISAP, 2022 (NavInfo Europe, Netherlands). [Paper]
"Vision-and-Language Pretrained Models: A Survey", IJCAI, 2022 (The University of Sydney). [Paper]
"Vision Transformers: State of the Art and Research Challenges", arXiv, 2022 (NYCU). [Paper]
"Transformers in Medical Imaging: A Survey", arXiv, 2022 (MBZUAI). [Paper][GitHub]
"Multimodal Learning with Transformers: A Survey", arXiv, 2022 (Oxford). [Paper]
"Transforming medical imaging with Transformers? A comparative review of key properties, current progresses, and future perspectives", arXiv, 2022 (CAS). [Paper]
"Transformers in 3D Point Clouds: A Survey", arXiv, 2022 (University of Waterloo). [Paper]
"A survey on attention mechanisms for medical applications: are we moving towards better algorithms?", arXiv, 2022 (INESC TEC and University of Porto, Portugal). [Paper]
"Efficient Transformers: A Survey", arXiv, 2022 (Google). [Paper]
"Are we ready for a new paradigm shift? A Survey on Visual Deep MLP", arXiv, 2022 (Tsinghua). [Paper]
"Vision Transformers in Medical Computer Vision - A Contemplative Retrospection", arXiv, 2022 (National University of Sciences and Technology (NUST), Pakistan). [Paper]
"Video Transformers: A Survey", arXiv, 2022 (Universitat de Barcelona, Spain). [Paper]
"Transformers in Medical Image Analysis: A Review", arXiv, 2022 (Nanjing University). [Paper]
"Recent Advances in Vision Transformer: A Survey and Outlook of Recent Work", arXiv, 2022 (?). [Paper]
"Transformers Meet Visual Learning Understanding: A Comprehensive Review", arXiv, 2022 (Xidian University). [Paper]
"Image Captioning In the Transformer Age", arXiv, 2022 (Alibaba). [Paper][GitHub]
"Visual Attention Methods in Deep Learning: An In-Depth Survey", arXiv, 2022 (Fayoum University, Egypt). [Paper]
"Transformers in Vision: A Survey", ACM Computing Surveys, 2021 (MBZUAI). [Paper]
"Survey: Transformer based Video-Language Pre-training", arXiv, 2021 (Renmin University of China). [Paper]
"A Survey of Transformers", arXiv, 2021 (Fudan). [Paper]
"A Survey of Visual Transformers", arXiv, 2021 (CAS). [Paper]
"Attention mechanisms and deep learning for machine vision: A survey of the state of the art", arXiv, 2021 (University of Kashmir, India). [Paper]

Ultimate-Awesome-Transformer-Attention

Overview

Survey

Image Classification / Backbone

Replace Conv w/ Attention

Pure Attention

Conv-stem + Attention

Conv + Attention

Vision Transformer

General Vision Transformer

Efficient Vision Transformer

Conv + Transformer

Training + Transformer

Robustness + Transformer

Model Compression + Transformer

Attention-Free

MLP-Series

Other Attention-Free

Analysis for Transformer

Detection

Object Detection

3D Object Detection

Multi-Modal Detection

HOI Detection

Salient Object Detection

Other Detection Tasks

Segmentation

Semantic Segmentation

Depth Estimation

Object Segmentation

Other Segmentation Tasks

Video (High-level)

Action Recognition

Action Detection/Localization

Action Prediction/Anticipation

Video Object Segmentation

Video Instance Segmentation

Other Video Tasks

Multi-Modality

VQA / Captioning

Visual Grounding

Multi-Modal Representation Learning

Multi-Modal Retrieval

Multi-Modal Generation

Visual Document Understanding

Scene Graph

Other Multi-Modal Tasks

Other High-level Vision Tasks

Point Cloud

Pose Estimation

Tracking

Re-ID

Face

Neural Architecture Search

Transfer / X-Supervised / X-Shot / Continual Learning

Low-level Vision Tasks

Image Restoration

Video Restoration

Inpainting / Completion / Outpainting

Image Generation

Video Generation

Transfer / Translation / Manipulation

Other Low-Level Tasks

Reinforcement Learning

Navigation

Other RL Tasks

Medical

Medical Segmentation

Medical Classification

Medical Detection

Medical Reconstruction

Medical Low-Level Vision

Medical Others

Other Tasks

Attention Mechanisms in Vision/NLP

Attention for Vision

Attention for NLP

Attention for Both

Attention for Others

Citation