[Feature] Fix support for sequence parallelism with MoEs

Question

[Feature] Fix support for sequence parallelism with MoEs

NouamaneTazi opened this issue 4 months ago · comments

Our current MoE implementation only works with tp_mode="ALL_REDUCE". We should fix the implementation when using tp_mode="REDUCE_SCATTER" to support sequence parallelism