About reproduction baseline results

Question

About reproduction baseline results

dydrkfl06 opened this issue 5 months ago · comments

Thank for sharing your great works!

We are doing reproduction of your method for research purpose and found that the Medusa inference for baseline is also reported in your blog. We tried to check the speed of both EAGLE and Medusa methods with Llama2 70B Chat, but I guess official repo of Medusa doesn’t support Llama2 architecture at inference(maybe Medusa KV cache doesn’t match with Llama2).
It would be thankful if you can provide your Medusa inference code with Llama2 70B chat so that we can cross-check EAGLE has far better acceleration on baseline models.

Thanks for reading.

Hongyang Zhang · Answer 1 · Mon Jan 15 2024 22:52:07 GMT+0800 (China Standard Time)

We didn't report Medusa's inference result on Llama2. Medusa's inference result on Vicuna was just copied from Medusa's own technical report. You can ask Medusa's authors for their support.

YongJun Jeong · Answer 2 · Tue Jan 16 2024 12:23:02 GMT+0800 (China Standard Time)

Sorry for misunderstanding. I'll ask Medusa's authors as you advised. Thanks!