tlc-pack/libflash_attn

The flash attention v2 kernel has been extracted from the original repo into this repo to make it easier to integrate into a third-party project. In particular, the dependency on libtorch was removed.

As a consquence, dropout is not supported (since the original code uses randomness provided by libtorch). Also, only forward is supported for now.

Build with

mkdir build && cd build
cmake ..
make

It seems there are compilation issues if g++-9 is used as the host compiler. We confirmed that g++-11 works without issues.

About

Standalone Flash Attention v2 kernel without libtorch dependency

BSD 3-Clause "New" or "Revised" License

Languages

Language:C++ 91.7%Language:Cuda 7.4%Language:CMake 0.9%