UranusSeven / Effective-LLM-Inference-Evaluation

A project aimed at measuring the real-world performance of Large Language Model (LLM) inference frameworks, inspired by the concepts in deepspeed-fastgen.