请教下,混合使用这两种方案会有哪些优势呢?技术出发点有介绍吗?
nullnonenilNULL opened this issue · comments
https://zhuanlan.zhihu.com/p/689067888
看这个文章吧
Also refer to this issue. #40
Sequence Parallel Attention for Long Context LLM Model Training and Inference
nullnonenilNULL opened this issue · comments
https://zhuanlan.zhihu.com/p/689067888
看这个文章吧
Also refer to this issue. #40