deepseek-ai / DeepSeek-V2

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error in Equation 16?

zhongmz opened this issue · comments

It appears that the current formulation is ?

q_{t,i} = [q^{C}{t,i};q{t}^R],

The formula is correct. q^R are multi-head, and only k^R is shared.
You can refer to the illustration of DeepSeek-V2 for an intuitive understanding.