deepseek-ai / DeepSeek-V2

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Repository from Github https://github.comdeepseek-ai/DeepSeek-V2Repository from Github https://github.comdeepseek-ai/DeepSeek-V2

HuggingFace中开源的代码似乎没有实现矩阵合并

meteorlin opened this issue · comments

作者好!我看了您在HuggingFace上开源的代码,其中的注意力部分似乎没有实现论文中提到的Q、K映射矩阵合并(absorbed),想请教下这块内容具体是在哪进行了等效实现?

I have the same question here. In the open-source implementation on huggingface, k still has multiple heads, and k,v still be saved during inference, which is completely different from the statements in the architecture part.