NVIDIA / transformer-ls

Official PyTorch Implementation of Long-Short Transformer (NeurIPS 2021).

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Long-Short Transformer (Transformer-LS)

This repository hosts the code and models for the paper:

Long-Short Transformer: Efficient Transformers for Language and Vision

Updates

Architecture

plot Long-short Transformer substitutes the full self attention of the original Transformer models with an efficient attention that considers both long-range and short-term correlations. Each query attends to tokens from the segment-wise sliding window to capture short-term correlations, and the dynamically projected features to capture long-range correlations. To align the norms of the original and projected feature vectors and improve the efficacy of the aggregation, we normalize the original and project feature vectors with two sets of Layer Normalizations.

Tasks

About

Official PyTorch Implementation of Long-Short Transformer (NeurIPS 2021).

License:MIT License


Languages

Language:Python 94.1%Language:Shell 5.7%Language:Dockerfile 0.2%