Awesome-Attention-Mechanism-in-cv

Introduction
Attention Mechanism
Plug and Play Module
Vision Transformer
Contribute

Introduction

PyTorch implements a variety of Attention mechanisms used in network design in computer vision, as well as a collection of plug and play modules. Due to limited ability and energy, many modules may not be included.

If you have any suggestions or improvements, welcome to submit an issue or PR.

Attention Mechanism

Paper	Publish	Link	Main Idea	Blog
Global Second-order Pooling Convolutional Networks	CVPR19	GSoPNet	将高阶和注意力机制在网络中部地方结合起来
Neural Architecture Search for Lightweight Non-Local Networks	CVPR20	AutoNL	NAS+LightNL
Squeeze and Excitation Network	CVPR18	SENet	最经典的通道注意力	zhihu
Selective Kernel Network	CVPR19	SKNet	SE+动态选择	zhihu
Convolutional Block Attention Module	ECCV18	CBAM	串联空间+通道注意力	zhihu
BottleNeck Attention Module	BMVC18	BAM	并联空间+通道注意力	zhihu
Concurrent Spatial and Channel ‘Squeeze & Excitation’ in Fully Convolutional Networks	MICCAI18	scSE	并联空间+通道注意力	zhihu
Non-local Neural Networks	CVPR19	Non-Local(NL)	self-attention	zhihu
GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond	ICCVW19	GCNet	对NL进行改进	zhihu
CCNet: Criss-Cross Attention for Semantic Segmentation	ICCV19	CCNet	对NL改进
SA-Net:shuffle attention for deep convolutional neural networks	ICASSP 21	SANet	SGE+channel shuffle	zhihu
ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks	CVPR20	ECANet	SE的改进
Spatial Group-wise Enhance: Improving Semantic Feature Learning in Convolutional Networks	CoRR19	SGENet	Group+spatial+channel
FcaNet: Frequency Channel Attention Networks	ICCV21	FcaNet	频域上的SE操作
$A^2\text{-}Nets$: Double Attention Networks	NeurIPS18	DANet	NL的**应用到空间和通道
Asymmetric Non-local Neural Networks for Semantic Segmentation	ICCV19	APNB	spp+NL
Efficient Attention: Attention with Linear Complexities	CoRR18	EfficientAttention	NL降低计算量
Image Restoration via Residual Non-local Attention Networks	ICLR19	RNAN
Exploring Self-attention for Image Recognition	CVPR20	SAN	理论性很强，实现起来很简单
An Empirical Study of Spatial Attention Mechanisms in Deep Networks	ICCV19	None	MSRA综述self-attention
Object-Contextual Representations for Semantic Segmentation	ECCV20	OCRNet	复杂的交互机制，效果确实好
IAUnet: Global Context-Aware Feature Learning for Person Re-Identification	TTNNLS20	IAUNet	引入时序信息
ResNeSt: Split-Attention Networks	CoRR20	ResNeSt	SK+ResNeXt
Gather-Excite: Exploiting Feature Context in Convolutional Neural Networks	NeurIPS18	GENet	SE续作
Improving Convolutional Networks with Self-calibrated Convolutions	CVPR20	SCNet	自校正卷积
Rotate to Attend: Convolutional Triplet Attention Module	WACV21	TripletAttention	CHW两两互相融合
Dual Attention Network for Scene Segmentation	CVPR19	DANet	self-attention
Relation-Aware Global Attention for Person Re-identification	CVPR20	RGANet	用于reid
Attentional Feature Fusion	WACV21	AFF	特征融合的attention方法
An Attentive Survey of Attention Models	CoRR19	None	包括NLP/CV/推荐系统等方面的注意力机制
Stand-Alone Self-Attention in Vision Models	NeurIPS19	FullAttention	全部的卷积都替换为self-attention
BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation	ECCV18	BiSeNet	类似FPN的特征融合方法	zhihu
DCANet: Learning Connected Attentions for Convolutional Neural Networks	CoRR20	DCANet	增强attention之间信息流动
An Empirical Study of Spatial Attention Mechanisms in Deep Networks	ICCV19	None	对空间注意力进行针对性分析
Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition	CVPR17 Oral	RA-CNN	细粒度识别
Guided Attention Network for Object Detection and Counting on Drones	ACM MM20	GANet	处理目标检测问题
Attention Augmented Convolutional Networks	ICCV19	AANet	多头+引入额外特征映射
GLOBAL SELF-ATTENTION NETWORKS FOR IMAGE RECOGNITION	ICLR21	GSA	新的全局注意力模块
Attention-Guided Hierarchical Structure Aggregation for Image Matting	CVPR20	HAttMatting	抠图方面的应用，高层使用通道注意力机制，然后再使用空间注意力机制指导低层。
Weight Excitation: Built-in Attention Mechanisms in Convolutional Neural Networks	ECCV20	None	与SE互补的权值激活机制
Expectation-Maximization Attention Networks for Semantic Segmentation	ICCV19 Oral	EMANet	EM+Attention
Dense-and-implicit attention network	AAAI 20	DIANet	LSTM+block间特征共享+通道注意力
Coordinate Attention for Efficient Mobile Network Design	CVPR21	CoordAttention	横向、竖向
Cross-channel Communication Networks	NIPS19	C3Net	GNN+SE
Gated Convolutional Networks with Hybrid Connectivity for Image Classification	AAAI20	HCGNet	引入了LSTM的部分概念
Weighted Channel Dropout for Regularization of Deep Convolutional Neural Network	AAAI19	None	Dropout+SE
BA^2M: A Batch Aware Attention Module for Image Classification	CVPR21	None	Batch之间建立attention
EPSANet：An Efficient Pyramid Split Attention Block on Convolutional Neural Network	CoRR21	EPSANet	多尺度
Stand-Alone Self-Attention in Vision Models	NIPS19	SASA	Non-Local变体
ResT: An Efficient Transformer for Visual Recognition	CoRR21	ResT	self-attention变体
Spanet: Spatial Pyramid Attention Network for Enhanced Image Recognition	ICME20	SPANet	多个AAP组成金字塔
Space-time Mixing Attention for Video Transformer	CoRR21	X-VIT Not release	VIT+时空attention
DMSANet: Dual Multi Scale Attention Network	CoRR21	Not release yet	两尺度+轻量
CompConv: A Compact Convolution Module for Efficient Feature Learning	CoRR21	Not release yet	res2net+ghostnet
VOLO: Vision Outlooker for Visual Recognition	CoRR21	VOLO	ViT上的Attention
Interflow: Aggregating Multi-layer Featrue Mappings with Attention Mechanism	CoRR21	Not release yet	辅助头级别attention
MUSE: Parallel Multi-Scale Attention for Sequence to Sequence Learning	CoRR21	MUSE Attention	NLP中对SA进行改进
Polarized Self-Attention: Towards High-quality Pixel-wise Regression	CoRR21	PSA	Pixel-wise regression
CA-Net: Comprehensive Attention Convolutional Neural Networks for Explainable Medical Image Segmentation	TMI21	CA-Net	Spatial Attention
BAM: A Lightweight and Efficient Balanced Attention Mechanism for Single Image Super Resolution	CoRR21	BAM	Super resolution
Attention as Activation	CoRR21	ATAC	activation + attention
Region-based Non-local Operation for Video Classification	CoRR21	RNL	video classification
MSAF: Multimodal Split Attention Fusion	CoRR21	MSAF	MultiModal
All-Attention Layer	CoRR19	None	Tranformer Layer
Compact Global Descriptor	CoRR20	CGD	add every two channel attention
SimAM: A Simple, Parameter-Free Attention Module for Convolutional Neural Networks	ICML21	SimAM	类脑计算神经元能量
Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks With Octave Convolution	ICCV19	OctConv	从频率角度改进
Contextual Transformer Networks for Visual Recognition	ICCV21	CoTNet	虽然宣称Transformer改进，但实际上就是non-local非常接近
Residual Attention: A Simple but Effective Method for Multi-Label Recognition	ICCV21	CSRA	用于多标签图像识别任务
Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation	CVPR20	SEAM	弱监督
An Attention Module for Convolutional Neural Networks	ICCV2021	AW-Conv	提升了SE部分的容量
Attentive Normalization	Arxiv2020	None	BN+Attention
Person Re-identification via Attention Pyramid	TIP21	APNet	注意力金字塔+ReID
Unifying Nonlocal Blocks for Neural Networks	ICCV21	SNL	Non-Local + 引入图谱概念
Tiled Squeeze-and-Excite: Channel Attention With Local Spatial Context	ICCVW21	None	Spatial+Channel
PP-NAS: Searching for Plug-and-Play Blocks on Convolutional Neural Network	ICCVW21	PP-NAS	搜索即插即用模块
Distilling Knowledge via Knowledge Review	CVPR21	ReviewKD	知识蒸馏+Spatial Attention
Dynamic Region-Aware Convolution	CVPR21	None	动态生成卷积核
Encoder Fusion Network With Co-Attention Embedding for Referring Image Segmentation	CVPR21	None	STN-GRU
Introvert: Human Trajectory Prediction via Conditional 3D Attention	CVPR21	None	3D Attention
SSAN: Separable Self-Attention Network for Video Representation Learning	CVPR21	None	SSAN for video
Delving Deep into Many-to-many Attention for Few-shot Video Object Segmentation	CVPR21	DANet	Few-Shot Video Segmentation
A2 -FPN: Attention Aggregation based Feature Pyramid Network for Instance Segmentation	CVPR21	None	FPN+Attention
Image Super-Resolution with Non-Local Sparse Attention	CVPR21	None	SR+Non local
Keep your Eyes on the Lane: Real-time Attention-guided Lane Detection	CVPR21	LaneATT	Land Detection
NAM: Normalization-based Attention Module	CoRR21	NAM	Normal+Attention
NAS-SCAM: Neural Architecture Search-Based Spatial and Channel Joint Attention Module for Nuclei Semantic Segmentation and Classification	MICCAI20	NAS-SCAM	Attention Search
NASABN: A Neural Architecture Search Framework for Attention-Based Networks	IJCNN20	None	NLP+NAS
Att-DARTS: Differentiable Neural Architecture Search for Attention	IJCNN20	Att-Darts	Darts+AttentionSearch
On the Integration of Self-Attention and Convolution	CoRR21	ACMix	self attention+conv
BoxeR: Box-Attention for 2D and 3D Transformers	CoRR21	None	目标检测+attention
CoAtNet: Marrying Convolution and Attention for All Data Sizes	NIPS21	coatnet	VIT
Pay Attention to MLPs	NIPS21	gmlp	MLP
IC-Conv: Inception Convolution With Efficient Dilation Search	CVPR21 Oral	IC-Conv	空洞率搜索
SRM : A Style-based Recalibration Module for Convolutional Neural Networks	ICCV19	SRM	Style校准注意力
SPANet: Spatial Pyramid Attention Network for Enhanced Image Recognition	ICME20	SPANet	SE+SP
Competitive Inner-Imaging Squeeze and Excitation for Residual Network	CoRR18	Competitive-SENet	引入skip connection信息
ULSAM: Ultra-Lightweight Subspace Attention Module for Compact Convolutional Neural Networks	WACV20	ULSAM	空间注意力
Augmenting Convolutional networks with attention-based aggregation	CoRR21	None	在ViT范式基础上增加线性注意力
Context-aware Attentional Pooling (CAP) for Fine-grained Visual Classification	AAAI21	CAP	结合achor,LSTM,SE等构建注意力实现细粒度识别

Dynamic Networks

Title	Publish	Github	Main Idea
Dynamic Neural Networks: A Survey	CoRR21	None	综述
CondConv: Conditionally Parameterized Convolutions for Efficient Inference	NIPS19	CondConv	卷积核参数通过对输入进行变换得到
DyNet: Dynamic Convolution for Accelerating Convolutional Neural Networks	CoRR20	None	学习一组核系数并用于融合多个固定核为一个动态核
Dynamic Convolution: Attention over Convolution Kernels	CVPR20	Dynamic-convolution-Pytorch	多卷积核融合提升模型表达
WeightNet: Revisiting the Design Space of Weight Network	ECCV20	weightNet	SENet融合CondConv
Dynamic Filter Networks
Dynamic deep neural networks: Optimizing accuracy-efficiency trade-offs by selective execution
SkipNet: Learning Dynamic Routing in Convolutional Networks
Pay Less Attention with Lightweight and Dynamic Convolutions
Unified Dynamic Convolutional Network for Super-Resolution with Variational Degradations
Dynamic Group Convolution for Accelerating Convolutional Neural Networks	ECCV20	dgc	组局部性

Plug and Play Module

Title	Publish	Github	Main Idea
ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks	ICCV19	ACNet	重参数化
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs	TPAMI18	ASPP	空洞卷积
MixConv: Mixed Depthwise Convolutional Kernels	BMCV19	MixedConv	不同kernel的卷积
Pyramid Scene Parsing Network	CVPR17	PSP	金字塔池化
Receptive Field Block Net for Accurate and Fast Object Detection	ECCV18	RFB	空洞卷积
Strip Pooling: Rethinking Spatial Pooling for Scene Parsing	CVPR20	SPNet	两个方向池化
SSH: Single Stage Headless Face Detector	ICCV17	SSH	最简单的感受野模块
GhostNet: More Features from Cheap Operations	CVPR20	GhostNet	简单而有效
SlimConv: Reducing Channel Redundancy in Convolutional Neural Networks by Weights Flipping	TIP21	SlimConv	Flip操作+SE
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks	ICML19	EfficientNet	出色的网络构建模块
CondConv: Conditionally Parameterized Convolutions for Efficient Inference	NIPS19	CondConv	动态卷积
PP-NAS: Searching for Plug-and-Play Blocks on Convolutional Neural Network	ICCVW21	PPNAS	组间链接搜索
Dynamic Convolution: Attention over Convolution Kernels	CVPR20	DynamicConv	动态滤波器
PSConv: Squeezing Feature Pyramid into One Compact Poly-Scale Convolutional Layer	ECCV20	PSConv	细粒度多尺度
DCANet: Dense Context-Aware Network for Semantic Segmentation	ECCV20	DCANet	注意力
Enhancing feature fusion for human pose estimation	MVA20	SEB	特征融合
Object Contextual Representation for sematic segmentation	ECCV2020	HRNet-OCR	OCRModule
DO-Conv: Depthwise Over-parameterized Convolutional Layer	CoRR20	DO-Conv	over-parameterized Conv
Pyramidal Convolution: Rethinking Convolutional Neural Networks for Visual Recognition	CoRR20	PyConv	不同kernel的卷积
ULSAM: Ultra-Lightweight Subspace Attention Module for Compact Convolutional Neural Networks	WACV20	ULSAM	空间注意力
Dynamic Group Convolution for Accelerating Convolutional Neural Networks	ECCV20	DGC	动态分组卷积

Vision Transformer

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, ICLR 2021, ViT

[paper] [Github]

Title	Publish	Github	Main Idea
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows	ICCV21	SwinT
CPVT: Conditional Positional Encodings for Vision Transformer	CoRR21	CPVT
GLiT: Neural Architecture Search for Global and Local Image Transformer	CoRR21	GLiT	NAS
ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases	CoRR21	ConViT	GPSA
CeiT: Incorporating Convolution Designs into Visual Transformers	CoRR21	CeiT	LCA,LeFF
BoTNet: Bottleneck Transformers for Visual Recognition	CVPR21	BoTNet	NonBlock-like
CvT: Introducing Convolutions to Vision Transformers	ICCV21	CvT	projection
TransCNN: Transformer in Convolutional Neural Networks	CoRR21	TransCNN
ResT: An Efficient Transformer for Visual Recognition	CoRR21	ResT
CoaT: Co-Scale Conv-Attentional Image Transformers	CoRR21	CoaT
ConTNet: Why not use convolution and transformer at the same time?	CoRR21	ConTNet
DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification	NIPS21	DynamicViT
DVT: Not All Images are Worth 16x16 Words: Dynamic Transformers for Efficient Image Recognition	NIPS21	DVT
CoAtNet: Marrying Convolution and Attention for All Data Sizes	CoRR21	CoAtNet
Early Convolutions Help Transformers See Better	CoRR21	None
Compact Transformers: Escaping the Big Data Paradigm with Compact Transformers	CoRR21	CCT
MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer	CoRR21	MobileViT
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference	CoRR21	LeViT
Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer	CoRR21	ShuffleTransformer
ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias	CoRR21	ViTAE
LocalViT: Bringing Locality to Vision Transformers	CoRR21	LocalViT
DeiT: Training data-efficient image transformers & distillation through attention	ICML21	DeiT
CaiT: Going deeper with Image Transformers	ICCV21	CaiT
Efﬁcient Training of Visual Transformers with Small-Size Datasets	NIPS21	None
Vision Transformer with Deformable Attention	CoRR22	DAT	DeformConv+SA
MaxViT: Multi-Axis Vision Transformer	CoRR22	None	dilated attention

Contribute

欢迎在issue中提出补充的文章paper和对应code链接。

感谢@dedekinds 指出的DIANet描述中存在的问题。

https://programmathically.com/understanding-padding-and-stride-in-convolutional-neural-networks/

cenchaojun / awesome-attention-mechanism-in-cv