[Feature] Fix support for sequence parallelism with MoEs
NouamaneTazi opened this issue · comments
Nouamane Tazi commented
Our current MoE implementation only works with tp_mode="ALL_REDUCE"
. We should fix the implementation when using tp_mode="REDUCE_SCATTER"
to support sequence parallelism