There are 1 repository under efficient-attention topic.
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.
Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch
Implementation of the conditionally routed attention in the CoLT5 architecture, in Pytorch
Efficient Infinite Context Transformers with Infini-attention Pytorch Implementation + QwenMoE Implementation + Training Script + 1M context keypass retrieval
The official PyTorch implementation for CascadedGaze: Efficiency in Global Context Extraction for Image Restoration, TMLR'24.
Unofficial PyTorch implementation of the paper "cosFormer: Rethinking Softmax In Attention".
Pytorch implementation of "Compact Global Descriptor for Neural Networks" (CGD).
Implementation of: Hydra Attention: Efficient Attention with Many Heads (https://arxiv.org/abs/2209.07484)
Official Implementation of SEA: Sparse Linear Attention with Estimated Attention Mask (ICLR 2024)
Nonparametric Modern Hopfield Models
Resources and references on solved and unsolved problems in attention mechanisms.
Minimal implementation of Samba by Microsoft in PyTorch