deepseek-ai / DeepSeek-V2

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MLA vs MHA

jiangix-paper opened this issue · comments

Hello, great work. I want to know why the performance of MLA is better than that of MHA. I think MLA is a approximate low-rank decomposition for MHA.