Transformer related optimization, including BERT, GPT
Geek Repo:Geek Repo
Github PK Tool:Github PK Tool
iteratorlee opened this issue a year ago · comments
#743 also mentions this issue. So is there a guiding tutorial about how to use expert parallelism in MoE inference?