deepseek-ai / DeepSeek-V2

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to understand W^UK can be absorbed into W^Q and W^UV can be absorbed into W^O?

cc752424640 opened this issue · comments

Here's a recommended blog for you: https://spaces.ac.cn/archives/10091.
@cc752424640