MLA vs MHA
jiangix-paper opened this issue · comments
Hello, great work. I want to know why the performance of MLA is better than that of MHA. I think MLA is a approximate low-rank decomposition for MHA.
Same issue: #26
@jiangix-paper
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
jiangix-paper opened this issue · comments
Hello, great work. I want to know why the performance of MLA is better than that of MHA. I think MLA is a approximate low-rank decomposition for MHA.
Same issue: #26
@jiangix-paper