RWKV-howto

possibly useful materials and tutorial for learning RWKV.

RWKV: Parallelizable RNN with Transformer-level LLM Performance.

🌟(2023-05) RWKV: Reinventing RNNs for the Transformer Era arxiv
(2023-03) Resurrecting Recurrent Neural Networks for Long Sequences arxiv
(2023-02) SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks arxiv
(2022-08) Simplified State Space Layers for Sequence Modeling ICLR2023
🌟(2021-05) An Attention Free Transformer arxiv
(2021-10) Efficiently Modeling Long Sequences with Structured State Spaces ICLR2022
(2020-08) Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention ICML2020
(2018) Parallelizing Linear Recurrent Neural Nets Over Sequence Length ICLR2018
(2017-09) Simple Recurrent Units for Highly Parallelizable Recurrence EMNLP2017
(2017-10) MinimalRNN: Toward More Interpretable and Trainable Recurrent Neural Networks Neurips2017
(2017-06) Attention Is All You Need Neurips2017
(2016-11) Quasi-Recurrent Neural Networks ICLR2017

Introducing RWKV - An RNN with the advantages of a transformer Hugging Face
有了Transformer框架后是不是RNN完全可以废弃了？知乎
RNN最简单有效的形式是什么？知乎
🌟RWKV的RNN CNN二象性知乎
RNN的隐藏层需要非线性吗？知乎
Google新作试图“复活”RNN：RNN能否再次辉煌？苏剑林
🌟How the RWKV language model works Johan Sokrates Wind
🌟The RWKV language model: An RNN with the advantages of a transformer Johan Sokrates Wind
The Unreasonable Effectiveness of Recurrent Neural Networks Andrej Karpathy blog

About

possibly useful materials for learning RWKV language model.