pjlab-sys4nlp / llama-moe

⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Can you report the running time on hardware?

qiuzh20 opened this issue · comments

Thank you to the authors for providing a method for transforming a dense model into MoEs for more efficient inference!

MoEfication provides acceleration results for the transformed model on CPU and GPU, while the current technical report for Llama MoE does not contain this information. Can the authors provide the relevant reference information?