Exploring the Combined Effects of YaRN and Adjusted rope_base Values in deepseek v2

Question

Exploring the Combined Effects of YaRN and Adjusted rope_base Values in deepseek v2

hannlp opened this issue a year ago · comments

In deepseek v2, static YaRN with rope_base=10000 was used, yielding excellent extrapolation results. Could the authors clarify whether they have attempted to set rope_base to 500000 while using YaRN, and if so, whether this combination produces a synergistic effect, surpassing both YaRN (rope_base=10000) and NTK-aware (rope_base=500000)? @luofuli