Can you report the running time on hardware?

Question

Can you report the running time on hardware?

qiuzh20 opened this issue 6 months ago · comments

Thank you to the authors for providing a method for transforming a dense model into MoEs for more efficient inference!

MoEfication provides acceleration results for the transformed model on CPU and GPU, while the current technical report for Llama MoE does not contain this information. Can the authors provide the relevant reference information?