See "Sparse Attention with Learning to Hash" (ICLR 2022) for the paper associated with this library.
The code is adapted from Long-Range-Arena and Transformer-XL. We provide the implementation of standard Transformer, Reformer, Performer, and LHA Transformer for encoder-only (LRA) and decoder-only (language modeling) tasks.
If you found this codebase useful, please consider citing the paper:
@inproceedings{
sun2022sparse,
title={Sparse Attention with Learning to Hash},
author={Zhiqing Sun and Yiming Yang and Shinjae Yoo},
booktitle={International Conference on Learning Representations},
year={2022},
url={https://openreview.net/forum?id=VGnOJhd5Q1q}
}