Transformer-in-Vision

Recent Transformer-based CV and related works. Welcome to comment/contribute!

The transformer is now a basic component, adopted in nearly all AI models. Keep updated --> updated irregularly.

New Hope: LLM-in-Vision

Resource

ChatGPT for Robotics: Design Principles and Model Abilities, [Paper], [Code]
DIFFUSIONDB [Page], [Paper]
LAION-5B [Page], [Paper]
LAVIS [Page], [Paper]
Imagen Video [Page], [Paper]
Phenaki [Page], [Paper]
DREAMFUSION [Page], [Paper]
MAKE-A-VIDEO [Page], [Paper]
Stable Difffusion [Page], [Paper]
NUWA-Infinity [Page], [Paper]
Parti [Page], [Code]
Imagen [Page], [Paper]
Gato: A Generalist Agent, [Paper]
PaLM: Scaling Language Modeling with Pathways, [Paper]
DALL·E 2 [Page], [Paper]
SCENIC: A JAX Library for Computer Vision Research and Beyond, [Code]
V-L joint learning study (with good tables): [METER], [Kaleido-BERT]
Attention is all you need, [Paper]
CLIP [Page], [Paper], [Code], [arXiv]
DALL·E [Page], [Code], [Paper]
huggingface/transformers
Kyubyong/transformer, TF
jadore801120/attention-is-all-you-need-pytorch, Torch
krasserm/fairseq-image-captioning
PyTorch Transformers Tutorials
ictnlp/awesome-transformer
basicv8vc/awesome-transformer
dk-liang/Awesome-Visual-Transformer
yuewang-cuhk/awesome-vision-language-pretraining-papers

Survey

(arXiv 2023.2) TRANSFORMER-BASED SENSOR FUSION FOR AUTONOMOUS DRIVING: A SURVEY, [Paper], [Page]
(arXiv 2023.2) Deep Learning for Video-Text Retrieval: a Review, [Paper]
(arXiv 2023.2) Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey, [Paper]
(arXiv 2023.2) Transformer-based Generative Adversarial Networks in Computer Vision: A Comprehensive Survey, [Paper]
(arXiv 2023.2) Knowledge Distillation in Vision Transformers: A Critical Review, [Paper]
(arXiv 2023.2) A Survey on Efficient Training of Transformers, [Paper]
(arXiv 2023.1) ChatGPT is not all you need. A State of the Art Review of large Generative AI models, [Paper]
(arXiv 2022.12) Transformers in Action Recognition: A Review on Temporal Modeling, [Paper]
(arXiv 2022.11) Vision Transformers in Medical Imaging: A Review, [Paper]
(arXiv 2022.11) A survey on knowledge-enhanced multimodal learning, [Paper]
(arXiv 2022.10) Vision-Language Pre-training: Basics, Recent Advances, and Future Trends, [Paper]
(arXiv 2022.10) A Survey on Graph Neural Networks and Graph Transformers in Computer Vision: A Task-Oriented Perspective, [Paper]
(arXiv 2022.09) VISION TRANSFORMERS FOR ACTION RECOGNITION: A SURVEY, [Paper]
(arXiv 2022.09) Transformers in Remote Sensing: A Survey, [Paper], [Code]
(arXiv 2022.08) 3D Vision with Transformers: A Survey, [Paper], [Code]
(arXiv 2022.08) A Survey on Masked Autoencoder for Self-supervised Learning in Vision and Beyond, [Paper]
(arXiv 2022.07) Vision Transformers: State of the Art and Research Challenges, [Paper]
(arXiv 2022.07) SELF-SUPERVISED LEARNING FOR VIDEOS: A SURVEY, [Paper]
(arXiv 2022.06) Multimodal Learning with Transformers: A Survey, [Paper]
(arXiv 2022.05) Vision Transformer: Vit and its Derivatives, [Paper]
(arXiv 2022.05) Transformers in 3D Point Clouds: A Survey, [Paper]
(arXiv 2022.04) Visual Attention Methods in Deep Learning: An In-Depth Survey, [Paper]
(arXiv 2022.04) Vision-and-Language Pretrained Models: A Survey, [Paper]
(arXiv 2022.03) A Roadmap for Big Model, [Paper]
(arXiv 2022.03) Transformers Meet Visual Learning Understanding: A Comprehensive Review, [[Paper]](https://arxiv.org/pdf/2203.12944.pdf）
(arXiv 2022.03) Recent Advances in Vision Transformer: A Survey and Outlook of Recent Work, [Paper], [Project]
(arXiv 2022.02) A Survey of Vision-Language Pre-Trained Models, [Paper]
(arXiv 2022.02) VLP: A Survey on Vision-Language Pre-training, [Paper]
(arXiv 2022.02) Transformer for Graphs: An Overview from Architecture Perspective, [Paper]
(arXiv 2022.01) Video Transformers: A Survey, [Paper]
(arXiv 2021.11) ARE WE READY FOR A NEW PARADIGM SHIFT? A SURVEY ON VISUAL DEEP MLP, [Paper]
(arXiv 2021.11) A Survey of Visual Transformers, [Paper]
(arXiv 2021.09) Survey: Transformer based Video-Language Pre-training, [Paper]
(arXiv 2021.06) A Survey of Transformers, [Paper]
(arXiv 2021.06) Attention mechanisms and deep learning for machine vision: A survey of the state of the art, [Paper]
(arXiv 2021.06) Pre-Trained Models: Past, Present and Future, [Paper]
(arXiv 2021.05) Can Attention Enable MLPs To Catch Up With CNNs? [Paper]
(arXiv 2021.03) A Practical Survey on Faster and Lighter Transformers, [Paper]
(arXiv 2021.03) Perspectives and Prospects on Transformer Architecture for Cross-Modal Tasks with Language and Vision, [Paper]
(arXiv 2021.01) A Survey on Visual Transformer, [Paper]
(arXiv 2020.9) Efficient Transformers: A Survey, [Paper]
(arXiv 2020.1) Transformers in Vision: A Survey, [Paper]

DirtyHarryLYL / Transformer-in-Vision

Transformer-in-Vision

Resource

Survey

Recent Papers

2023.8

2023.5

2023.3

2023.2

2023.1

2022.12

2022.11

2022.10

2022.09

2022.08

2022.07

2022.06

2022.05

2022.04

2022.03

2022.02

2022.01

2021.12

2021.11

2021.10

2021.09

2021.08

About