Geek Repo
Inference Systems for Foundation Models
Github PK Tool:Github PK Tool
Running large language models on a single GPU for throughput-oriented scenarios.
[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.