feifeibear/long-context-attention Issues
显存占用问题
Closed 1请教下,混合使用这两种方案会有哪些优势呢?技术出发点有介绍吗?
Closed 1关于数据分割和合并
Closed 12请问下,这几类方法可以与Deepspeed Zero一块使用吗?
Closed 6The impact of head number
Closed 2
Sequence Parallel Attention for Long Context LLM Model Training and Inference