Yi Wang's repositories
alpaca.cpp-ios
Locally run an Instruction-Tuned Chat-Style LLM
Adv360-Pro-ZMK
Production repository for the all-new Advantage360 Professional using ZMK engine
cpuinfo
CPU INFOrmation library (x86/x86-64/ARM/ARM64, Linux/Windows/Android/macOS/iOS)
iree
👻
iree-for-apple-platforms
This project builds the IREE compiler for macOS and the IREE runtime for macOS, iOS, watchOS, and tvOS
jax-triton
jax-triton contains integrations between JAX and OpenAI Triton
JetStream
JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).
lam2s
lam2s = Lean And Mean LAnguagle Model Serving
Medusa
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
ml_collections
ML Collections is a library of Python Collections designed for ML use cases.
mlx
MLX: An array framework for Apple silicon
mlx-examples
Examples in the MLX framework
mlx-lm
Run LLMs with MLX
pytorch_memlab
Profiling and inspecting memory in pytorch
sentencepiece
Unsupervised text tokenizer for Neural Network-based text generation.
sglang
SGLang is a fast serving framework for large language models and vision language models.
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
tensorrtllm_backend
The Triton TensorRT-LLM Backend
torchft
PyTorch per step fault tolerance (actively under development)
xgrammar
Fast, Flexible and Portable Structured Generation