There are 1 repository under linear-attention topic.
RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7 "Goose". So it's combining the best of RNN and transformer - great performance, linear time, constant space (no kv-cache), fast training, infinite ctx_len, and free sentence embedding.
Speed Always Wins: A Survey on Efficient Architectures for Large Language Models
[NeurIPS 2024] Official code of ”LION: Linear Group RNN for 3D Object Detection in Point Clouds“
Explorations into the recently proposed Taylor Series Linear Attention
Implementation of Agent Attention in Pytorch
The semantic segmentation of remote sensing images
[NeurIPS 2025 Oral] Exploring Diffusion Transformer Designs via Grafting
The semantic segmentation of remote sensing images
CUDA implementation of autoregressive linear attention, with all the latest research findings
Offical implementation of "MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map" (NeurIPS2024 Oral)
Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)
Code for the paper "Cottention: Linear Transformers With Cosine Attention"
Implementation of: Hydra Attention: Efficient Attention with Many Heads (https://arxiv.org/abs/2209.07484)
Official Implementation of SEA: Sparse Linear Attention with Estimated Attention Mask (ICLR 2024)
[ICML 2024] Official implementation of "LeaPformer: Enabling Linear Transformers for Autoregressive and Simultaneous Tasks via Learned Proportions."
RWKV Wiki website (archived, please visit official wiki)
LEAP: Linear Explainable Attention in Parallel for causal language modeling with O(1) path length, and O(1) inference
🔍 Enhance your workflow with Houtini LM, an MCP server that offloads code analysis and documentation tasks to LM Studio, streamlining your development process.
Taming Transformers for High-Resolution Image Synthesis
Independent and reproducable benchmarking of linear attention models
Pure PyTorch implementations of Popular Linear Attention models