Modify FLOPs in MFU calculation for casual mask when using FlashAttention.

Question

Modify FLOPs in MFU calculation for casual mask when using FlashAttention.

Yuxin-CV opened this issue 2 months ago · comments

Hi, I suggest we modify the FLOPs calculation in the MFU according to the FlashAttention benchmark script.

Specifically, the current calculation for the casual mask can exceed 100% MFU for seq_len = 16k (189 * 2 / 312 = 1.21), which is inaccurate. The FLOPs for the casual mask setting should be divided by 2 when using FlashAttention.

Andrew Gu · Answer 1 · Fri May 17 2024 21:35:47 GMT+0800 (China Standard Time)

There was some past discussion on this (#280).