shengzhang90 / Neighborhood-Attention-Transformer

[Preprint] Neighborhood Attention Transformer, 2022

Home Page:https://arxiv.org/abs/2204.07143

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Neighborhood Attention Transformers

NAT-Intro NAT-Intro

Powerful hierarchical vision transformers based on sliding window attention.

Neighborhood Attention (NA, local attention) was introduced in our original paper, NAT, and runs efficiently with our CUDA extension to PyTorch, NATTEN.

We recently introduced a new model, DiNAT, which extends NA by dilating neighborhoods (DiNA, sparse global attention).

Combinations of NA/DiNA are capable of preserving locality, expanding the receptive field exponentially, and capturing longer-range inter-dependencies, leading to significant performance boosts in downstream vision tasks.

Dilated Neighborhood Attention 🔥

DiNAT-Abs DiNAT-Abs

A new hierarchical vision transformer based on Neighborhood Attention (local attention) and Dilated Neighborhood Attention (sparse global attention) that enjoys significant performance boost in downstream tasks.

Check out the DiNAT README.

Neighborhood Attention Transformer

NAT-Abs NAT-Abs

Our original paper, Neighborhood Attention Transformer (NAT), the first efficient sliding-window local attention.

How Neighborhood Attention works

Neighborhood Attention localizes the query token's (red) receptive field to its nearest neighboring tokens in the key-value pair (green). This is equivalent to dot-product self attention when the neighborhood size is identical to the image dimensions. Note that the edges are special (edge) cases.

720p_fast_dm 720p_fast_lm

News

September 29, 2022

July 9, 2022

  • NA CUDA extension v0.12 released.
    • NA runs much more efficiently now, up to 40% faster and uses up to 25% less memory compared to Swin Transformer’s Shifted Window Self Attention.
    • Improved FP16 throughput.
    • Improved training speed and stability.
    • See changelog.

May 12, 2022

  • 1-D Neighborhood Attention support added!
  • Moved the kernel to natten/ now, since there's a single version for all three tasks, and we're adding more features to the extension.

April 30, 2022

  • NA CUDA extension v0.11 released.
    • It's faster in both training and inference,
    • with a single version for all three tasks (no downstream-specific version)
  • PyTorch implementation released
    • Works both with and without CUDA, but not very efficient. Try to use the CUDA extension when possible.
    • See changelog.

Catalog

  • Neighborhood Attention 1D (CUDA)
  • Neighborhood Attention 2D (CUDA)
  • Neighborhood Attention 1D (PyTorch)
  • Neighborhood Attention 2D (PyTorch)
  • Dilation support
  • BFloat16 support (coming soon)
  • Zeros/Valid padding support (coming soon)
  • HuggingFace Demo

Citation

@article{hassani2022neighborhood,
	title        = {Neighborhood Attention Transformer},
	author       = {Ali Hassani and Steven Walton and Jiachen Li and Shen Li and Humphrey Shi},
	year         = 2022,
	url          = {https://arxiv.org/abs/2204.07143},
	eprint       = {2204.07143},
	archiveprefix = {arXiv},
	primaryclass = {cs.CV}
}
@article{hassani2022dilated,
	title        = {Dilated Neighborhood Attention Transformer},
	author       = {Ali Hassani and Humphrey Shi},
	year         = 2022,
	url          = {https://arxiv.org/abs/2209.15001},
	eprint       = {2209.15001},
	archiveprefix = {arXiv},
	primaryclass = {cs.CV}
}

About

[Preprint] Neighborhood Attention Transformer, 2022

https://arxiv.org/abs/2204.07143

License:MIT License


Languages

Language:Python 63.1%Language:Cuda 35.0%Language:C++ 1.8%Language:Shell 0.2%