deepseek-ai / DeepSeek-V2

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Repository from Github https://github.comdeepseek-ai/DeepSeek-V2Repository from Github https://github.comdeepseek-ai/DeepSeek-V2

MLA 的实现没有带来任何收益

XiaoduoAILab opened this issue · comments

下面是 DeepSeekV3 HF官网的MLA实现,可见存入KVCache的数据量,比基线(Llama)还大
befd0b9cfd014cb5f829002f1d1af1b

下面是推理测速,显示速度显著变慢,显存显著增大:
image