DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
Repository from Github https://github.comdeepseek-ai/DeepSeek-V2Repository from Github https://github.comdeepseek-ai/DeepSeek-V2
XiaoduoAILab opened this issue 8 months ago · comments
下面是 DeepSeekV3 HF官网的MLA实现,可见存入KVCache的数据量,比基线(Llama)还大
下面是推理测速,显示速度显著变慢,显存显著增大: