vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Home Page:https://docs.vllm.ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Feature]: rope_scaling for qwen2

HappyLynn opened this issue Β· comments

commented

πŸš€ The feature, motivation and pitch

We found that qwen2 such as Qwen2Attention does not accept rope_scaling. However, we need to use yarn/ntk feature. Could you support that?

Alternatives

No response

Additional context

No response