feifeibear / long-context-attention

Sequence Parallel Attention for Long Context LLM Model Training and Inference

请教下，混合使用这两种方案会有哪些优势呢？技术出发点有介绍吗？

nullnonenilNULL opened this issue 4 months ago · comments

HH&CC commented 4 months ago

Jiarui Fang commented 4 months ago

https://zhuanlan.zhihu.com/p/689067888
看这个文章吧

Also refer to this issue. #40