eth-easl / fmengine

Utilities for Training Very Large Models

Home Page:https://fmengine.readthedocs.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Flash Attention] Sliding window

xzyaoi opened this issue · comments

It seems now sliding window is supported in flash attention: https://github.com/Dao-AILab/flash-attention/blob/main/flash_attn/flash_attn_interface.py#L556

This seems to be important for mistral model, but also potentially useful for others. Let's see if we can/should integrate.