1 |
CKConv |
CKConv: Continuous Kernel Convolution For Sequential Data |
https://github.com/dwromero/ckconv |
ICLR 2021 |
https://openreview.net/forum?id=8FhxBtXSl0 |
2 |
FlexConv |
FlexConv: Continuous Kernel Convolutions With Differentiable Kernel Sizes |
https://github.com/rjbruin/flexconv |
ICLR 2022 |
https://openreview.net/forum?id=3jooF27-0Wy |
3 |
DLR |
Simplifying and Understanding State Space Models with Diagonal Linear RNNs |
https://github.com/ag1988/dlr |
|
https://arxiv.org/abs/2212.00768 |
4 |
CCNN |
Modelling Long Range Dependencies in $N$D: From Task-Specific to a General Purpose CNN |
https://github.com/david-knigge/ccnn |
ICLR 2023 |
https://openreview.net/forum?id=ZW5aK4yCRqU |
5 |
SGConv |
What Makes Convolutional Models Great on Long Sequence Modeling? |
https://github.com/ctlllll/SGConv |
ICLR 2023 |
https://openreview.net/forum?id=TGJSPbRpJX- |
6 |
Mega |
Mega: Moving Average Equipped Gated Attention |
https://github.com/facebookresearch/mega |
ICLR 2023 |
https://openreview.net/forum?id=qNLe3iq2El |
7 |
TNN |
Toeplitz Neural Network for Sequence Modeling |
https://github.com/Doraemonzzz/tnn-pytorch |
ICLR 2023 |
https://openreview.net/forum?id=IxmWsm4xrua |
8 |
Hyena |
Hyena Hierarchy: Towards Larger Convolutional Language Models |
https://github.com/hazyresearch/safari |
ICML 2023 |
https://proceedings.mlr.press/v202/poli23a.html |
9 |
MultiresNet |
Sequence Modeling with Multiresolution Convolutional Memory |
https://github.com/thjashin/multires-conv |
ICML 2023 |
https://proceedings.mlr.press/v202/shi23f.html |
10 |
LRU |
Resurrecting Recurrent Neural Networks for Long Sequences |
|
ICML 2023 |
https://proceedings.mlr.press/v202/orvieto23a.html |
11 |
RWKV v4 (Dove) |
RWKV: Reinventing RNNs for the Transformer Era |
https://github.com/BlinkDL/RWKV-LM |
EMNLP 2023 |
https://aclanthology.org/2023.findings-emnlp.936/ |
12 |
RetNet |
Retentive Network: A Successor to Transformer for Large Language Models |
https://github.com/microsoft/torchscale |
|
https://arxiv.org/abs/2307.08621 |
13 |
MultiHyena |
Laughing Hyena Distillery: Extracting Compact Recurrences From Convolutions |
|
NeurIPS 2023 |
https://openreview.net/forum?id=OWELckerm6 |
14 |
Monarch Mixer |
Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture |
https://github.com/HazyResearch/m2 |
NeurIPS 2023 |
https://openreview.net/forum?id=cB0BImqSS9 |
15 |
SeqBoat |
Sparse Modular Activation for Efficient Sequence Modeling |
https://github.com/renll/SeqBoat |
NeurIPS 2023 |
https://openreview.net/forum?id=TfbzX6I14i |
16 |
HGRN |
Hierarchically Gated Recurrent Neural Network for Sequence Modeling |
https://github.com/OpenNLPLab/HGRN |
NeurIPS 2023 |
https://openreview.net/forum?id=P1TCHxJwLB |
17 |
GLA Transformer |
Gated Linear Attention Transformers with Hardware-Efficient Training |
https://github.com/sustcsonglin/flash-linear-attention |
|
https://arxiv.org/abs/2312.06635 |
18 |
Orchid |
Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling |
|
|
https://arxiv.org/abs/2402.18508 |
19 |
RWKV v5 (Eagle) and v6 (Finch) |
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence |
https://huggingface.co/RWKV |
|
https://arxiv.org/abs/2404.05892 |
20 |
HGRN2 |
HGRN2: Gated Linear RNNs with State Expansion |
https://github.com/OpenNLPLab/HGRN2 |
|
https://arxiv.org/abs/2404.07904 |