deepseek-ai / DeepSeek-V2

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Device-Level Balance Loss and Communication Balance Loss

hsm1997 opened this issue · comments

What's the main difference?
As I see from your paper, pi' == pi'', and fi' = some_coeff * fi''
maybe fi'' should be:
... (Token t is sent to Device i from Device j where j!=i)

maybe the authors already meant this by using the word "sent"...