Li Xingjian's starred repositories
tensorrtllm_backend
The Triton TensorRT-LLM Backend
ThunderKittens
Tile primitives for speedy kernels
DeepSeek-V2
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
Awesome-LLM-Inference
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
llm-scheduling-artifact
Artifact of OSDI '24 paper, ”Llumnix: Dynamic Scheduling for Large Language Model Serving“
experiments
My exploration on new technologies.
attention_with_linear_biases
Code for the ALiBi method for transformer language models (ICLR 2022)
public-apis
A collective list of free APIs
mistral-inference
Official inference library for Mistral models