deepseek-ai / DeepSeek-V2

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Repository from Github https://github.comdeepseek-ai/DeepSeek-V2Repository from Github https://github.comdeepseek-ai/DeepSeek-V2

Equation 7 in DeepSeek-V2 Technical Report .

whatdhack opened this issue · comments

I am trying to understand equation 7 in the DeepSeek-V2 tech report. . Here are the confusions I am having.

  1. qti, kti, and vti are row vectors ? shapes are ( 1, dh) ?
  2. qti^T.Kji shape is (dh,1) . (1, dh) = (dh,dh) ?
  3. Softmax_j implies over the -1 dim ?
  4. qti^T.Kji . vji , would it compute as (dh, dh) . ( 1, dh) ?
  5. What is the summation over j for ?