Yuhong Li's starred repositories
text-generation-inference
Large Language Model Text Generation Inference
LLMTest_NeedleInAHaystack
Doing simple retrieval from LLM models at various context lengths to measure accuracy
Long-Context-Data-Engineering
Implementation of paper Data Engineering for Scaling Language Models to 128K Context
fstattention
Memory bandwidth efficient sparse tree attention
flashinfer
FlashInfer: Kernel Library for LLM Serving
search_with_lepton
Building a quick conversation-based search demo with Lepton AI.
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
mlx-examples
Examples in the MLX framework
PPO-PyTorch
Minimal implementation of clipped objective Proximal Policy Optimization (PPO) in PyTorch
mamba-minimal
Simple, minimal implementation of the Mamba SSM in one file of PyTorch.
lm-evaluation-harness
A framework for few-shot evaluation of language models.