deepseek-ai / DeepSeek-V2

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Repository from Github https://github.comdeepseek-ai/DeepSeek-V2Repository from Github https://github.comdeepseek-ai/DeepSeek-V2

Exploring the Combined Effects of YaRN and Adjusted rope_base Values in deepseek v2

hannlp opened this issue · comments

In deepseek v2, static YaRN with rope_base=10000 was used, yielding excellent extrapolation results. Could the authors clarify whether they have attempted to set rope_base to 500000 while using YaRN, and if so, whether this combination produces a synergistic effect, surpassing both YaRN (rope_base=10000) and NTK-aware (rope_base=500000)? @luofuli