zhuzilin/ring-flash-attention Issues
ring attention实现原理
Updated 8mask to zigzag attention
Closed 1Numerical errors in backward
Updated多机训练速度问题
Updated 2多卡qkv维度问题
Closed 2Does ring-attn not support dropout?
Updated 3ring flash attention with BPT
Updated 3是否需要更新全局最大值?
Closed 1stripe_flash_attn_varlen_func
Closed 1精度问题
Updated 1large memory usage
Updated 5test on 8*A800
Closed 3关于tp和分块操作最终聚合的问题
Closed 1flash attention版本
Updated 1Question about updating lse
Closed 24卡 A100 测试 ring attention 性能不太行呢
Closed 1ring attention with varlen
Closed 2Is it support casual attention?
Closed 1Great work
Updated