A project aimed at measuring the real-world performance of Large Language Model (LLM) inference frameworks, inspired by the concepts in deepspeed-fastgen.
Geek Repo:Geek Repo
Github PK Tool:Github PK Tool