Yingfeng's starred repositories
PowerInfer
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
Awesome-LLM-Inference
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
LLMCompiler
[ICML 2024] LLMCompiler: An LLM Compiler for Parallel Function Calling
flashinfer
FlashInfer: Kernel Library for LLM Serving
SqueezeLLM
[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization
SwiftInfer
Efficient AI Inference & Serving
neural-speed
An innovative library for efficient LLM inference via low-bit quantization
concurrent_deferred_rc
Concurrent Deferred Reference Counting
ServerlessLLM
Fast, easy and cost-efficient multi-LLM serving.
fast-multi-join-sketch
Fast Cardinality Estimation of Multi-Join Queries Using Sketches
exaloglog-paper
ExaLogLog: Space-Efficient and Practical Approximate Distinct Counting up to the Exa-Scale