Yuan's starred repositories
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
DeepSeek-Coder
DeepSeek Coder: Let the Code Write Itself
DeepSpeed-MII
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
executorch
On-device AI across mobile, embedded and edge for PyTorch
yet-another-applied-llm-benchmark
A benchmark to evaluate language models on questions I've previously asked them to solve.
incubator-xtable
Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
flashinfer
FlashInfer: Kernel Library for LLM Serving
RingAttention
Transformers with Arbitrarily Large Context
sort-research-rs
Test and benchmark suite for sort implementations.
libCacheSim
a high performance cache simulator and library
Gluten-Trino
Gluten: Plugin to Boost Trino's Performance
storage-testbench
A testbench for Google Cloud Storage client libraries.