SafeAILab / EAGLE

Official Implementation of EAGLE

Home Page:https://arxiv.org/abs/2401.15077

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

About reproduction baseline results

dydrkfl06 opened this issue · comments

Thank for sharing your great works!

We are doing reproduction of your method for research purpose and found that the Medusa inference for baseline is also reported in your blog. We tried to check the speed of both EAGLE and Medusa methods with Llama2 70B Chat, but I guess official repo of Medusa doesn’t support Llama2 architecture at inference(maybe Medusa KV cache doesn’t match with Llama2).
It would be thankful if you can provide your Medusa inference code with Llama2 70B chat so that we can cross-check EAGLE has far better acceleration on baseline models.

Thanks for reading.

We didn't report Medusa's inference result on Llama2. Medusa's inference result on Vicuna was just copied from Medusa's own technical report. You can ask Medusa's authors for their support.

Sorry for misunderstanding. I'll ask Medusa's authors as you advised. Thanks!