Awsome-Vision-MLPs
Collecting some MLPs with Computer-Vision (CV) papers. If you find some ignored papers, please open issues or pull requests.
Papers
Transformer original paper
-
Attention is All You Need (NIPS 2017)
-
Attention is not all you need: pure attention loses rank doubly exponentially with depth -2021.05.05
Technical blog
- [Chinese Blog] 3W字长文带你轻松入门视觉transformer [Link]
- [Chinese Blog] Vision Transformer , Vision MLP超详细解读 (原理分析+代码解读) (目录) [Link]
Survey
- Transformers in Vision: A Survey [paper] - 2021.02.22
- A Survey on Visual Transformer [paper] - 2020.1.30
arXiv papers
- Are we ready for a new paradigm shift? A Survey on Visual Deep MLP[paper]
- An Image Patch is a Wave: Phase-Aware Vision MLP[paper]
- MorphMLP: A Self-Attention Free, MLP-Like Backbone for Image and Video[paper]
- Sparse MLP for Image Recognition: Is Self-Attention Really Necessary?[paper]
- ConvMLP: Hierarchical Convolutional MLPs for Vision [paper] [code]
- Sparse-MLP: A Fully-MLP Architecture with Conditional Computation[paper]
- Hire-MLP: Vision MLP via Hierarchical Rearrangement[paper]
- RaftMLP: Do MLP-based Models Dream of Winning Over Computer Vision? [paper]
- S2-MLPv2: Improved Spatial-Shift MLP Architecture for Vision [paper]
- CycleMLP: A MLP-like Architecture for Dense Prediction [paper] [code]
- AS-MLP: AN AXIAL SHIFTED MLP ARCHITECTURE FOR VISION [paper] [code]
- Global Filter Networks for Image Classification [paper] [code]
- What Makes for Hierarchical Vision Transformer? [paper].
- Rethinking Token-Mixing MLP for MLP-based Vision Backbone [paper].
- [Vision Permutator] Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition [paper] [code]
- [S2-MLP] S2-MLP: Spatial-Shift MLP Architecture for Vision [paper]
- [Graph-MLP] Graph-MLP: Node Classification without Message Passing in Graph [paper]
- When Vision Transformers Outperform ResNets without Pretraining or Strong Data Augmentations [paper]
- [Container] Container: Context Aggregation Network [paper]
- Can Attention Enable MLPs To Catch Up With CNNs? [paper]
- [MixerGAN] MixerGAN: An MLP-Based Architecture for Unpaired Image-to-Image Translation [paper]
- Less is More: Pay Less Attention in Vision Transformers [paper]
- Can Attention Enable MLPs To Catch Up With CNNs? [paper]
- [ResMLP] ResMLP: Feedforward networks for image classification with data-efficient training [paper]
- Pay Attention to MLPs [paper]
- Do You Even Need Attention? A Stack of Feed-Forward Layers Does SurprisinglyWell on ImageNet [paper] [code]
- [RepMLP] RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition [paper] [code]
- Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks [paper]
- [MLP-Mixer] MLP-Mixer: An all-MLP Architecture for Vision [paper]