Kevinzz's starred repositories
llm-action
本项目旨在分享大模型相关技术原理以及实战经验。
flashinfer
FlashInfer: Kernel Library for LLM Serving
MInference
To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.
LLM-Viewer
Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.
Efficient-Multimodal-LLMs-Survey
Efficient Multimodal Large Language Models: A Survey
Awesome_LLM_System-PaperList
Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on accelerating LLMs, currently focusing mainly on inference acceleration, and related works will be gradually added in the future. Welcome contributions!
Awesome-LLM-Long-Context-Modeling
📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥
llm-inference-benchmark
LLM Inference benchmark
SpeculativeDecodingPapers
📰 Must-read papers and blogs on Speculative Decoding ⚡️
Awesome-Efficient-LLM
A curated list for Efficient Large Language Models