NVIDIA / FasterTransformer

Transformer related optimization, including BERT, GPT

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Supporting for expert parallelism in MoE inference

iteratorlee opened this issue · comments

commented

#743 also mentions this issue. So is there a guiding tutorial about how to use expert parallelism in MoE inference?