Learning-to-Hash-Attention

See "Sparse Attention with Learning to Hash" (ICLR 2022) for the paper associated with this library.

The code is adapted from Long-Range-Arena and Transformer-XL. We provide the implementation of standard Transformer, Reformer, Performer, and LHA Transformer for encoder-only (LRA) and decoder-only (language modeling) tasks.

If you found this codebase useful, please consider citing the paper:

@inproceedings{
    sun2022sparse,
    title={Sparse Attention with Learning to Hash},
    author={Zhiqing Sun and Yiming Yang and Shinjae Yoo},
    booktitle={International Conference on Learning Representations},
    year={2022},
    url={https://openreview.net/forum?id=VGnOJhd5Q1q}
}

About

Code of ICLR paper: https://openreview.net/forum?id=VGnOJhd5Q1q

MIT License

Languages

Language:Python 100.0%