E.T.

Intro

This repository is associated with the paper published in SC21.

E.T. Re-Thinking Self-Attention for Transformer Models on GPUs

It contains some implemented kernels mentioned in the paper and a few examples of encoder.

Tested on NVIDIA V100S GPU with CUDA 11.4.

There are three examples of encoders in test, all of which use random data.

On-the-fly attention with tensor-tile pruned linear transformations (encoder_tile_test)
Attention-aware pruning with pruned self-attention (encoder_prune_test)
Sequence-aware optimized encoder (encoder_length_test)

mkdir build && cd build 
cmake .. 
make -j

Language:Cuda 77.0%Language:C++ 17.3%Language:C 3.3%Language:CMake 2.4%