jquesnelle / yarn

YaRN: Efficient Context Window Extension of Large Language Models

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

For models with other architectures, such as Qwen family, how to find the best `\alpha`, `\beta` and `\sqrt{1/t}` parameters?

ki-ljl opened this issue · comments

The author mentioned in the paper that for the Llama family, the good values ​​of \alpha and \beta are 1 and 32, but did not mention how to obtain these two parameters. In addition, the author mentioned that \sqrt{1/t} can be fitted by the lowest ppl. Can this part be explained more clearly?

If anyone can answer my question I would appreciate it!