Awsome-Vision-MLPs

Collecting some MLPs with Computer-Vision (CV) papers. If you find some ignored papers, please open issues or pull requests.

Papers

Transformer original paper

Attention is All You Need (NIPS 2017)
Attention is not all you need: pure attention loses rank doubly exponentially with depth -2021.05.05

Technical blog

[Chinese Blog] 3W字长文带你轻松入门视觉transformer [Link]
[Chinese Blog] Vision Transformer , Vision MLP超详细解读 (原理分析+代码解读) (目录) [Link]

Survey

Transformers in Vision: A Survey [paper] - 2021.02.22
A Survey on Visual Transformer [paper] - 2020.1.30

arXiv papers

Are we ready for a new paradigm shift? A Survey on Visual Deep MLP[paper]
An Image Patch is a Wave: Phase-Aware Vision MLP[paper]
MorphMLP: A Self-Attention Free, MLP-Like Backbone for Image and Video[paper]
Sparse MLP for Image Recognition: Is Self-Attention Really Necessary?[paper]
ConvMLP: Hierarchical Convolutional MLPs for Vision [paper] [code]
Sparse-MLP: A Fully-MLP Architecture with Conditional Computation[paper]
Hire-MLP: Vision MLP via Hierarchical Rearrangement[paper]
RaftMLP: Do MLP-based Models Dream of Winning Over Computer Vision? [paper]
S2-MLPv2: Improved Spatial-Shift MLP Architecture for Vision [paper]
CycleMLP: A MLP-like Architecture for Dense Prediction [paper] [code]
AS-MLP: AN AXIAL SHIFTED MLP ARCHITECTURE FOR VISION [paper] [code]
Global Filter Networks for Image Classification [paper] [code]
What Makes for Hierarchical Vision Transformer? [paper].
Rethinking Token-Mixing MLP for MLP-based Vision Backbone [paper].
[Vision Permutator] Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition [paper] [code]
[S2-MLP] S2-MLP: Spatial-Shift MLP Architecture for Vision [paper]
[Graph-MLP] Graph-MLP: Node Classification without Message Passing in Graph [paper]
When Vision Transformers Outperform ResNets without Pretraining or Strong Data Augmentations [paper]
[Container] Container: Context Aggregation Network [paper]
Can Attention Enable MLPs To Catch Up With CNNs? [paper]
[MixerGAN] MixerGAN: An MLP-Based Architecture for Unpaired Image-to-Image Translation [paper]
Less is More: Pay Less Attention in Vision Transformers [paper]
Can Attention Enable MLPs To Catch Up With CNNs? [paper]
[ResMLP] ResMLP: Feedforward networks for image classification with data-efficient training [paper]
Pay Attention to MLPs [paper]
Do You Even Need Attention? A Stack of Feed-Forward Layers Does SurprisinglyWell on ImageNet [paper] [code]
[RepMLP] RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition [paper] [code]
Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks [paper]
[MLP-Mixer] MLP-Mixer: An all-MLP Architecture for Vision [paper]

HuCaoFighting / Awsome-Visual-MLPs

Awsome-Vision-MLPs

Papers

Transformer original paper

Technical blog

Survey

arXiv papers

About