zhouyuan

Yuan's starred repositories

minbpe

Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.

Language:PythonMIT8402 79 31

vanna

🤖 Chat with your SQL database 📊. Accurate Text-to-SQL Generation via LLMs using RAG 🔄.

Language:PythonMIT7466 47 217

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Language:C++Apache-2.06946 83 1368

DeepSeek-Coder

DeepSeek Coder: Let the Code Write Itself

Language:PythonMIT5648 63 140

LLMLingua

To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.

Language:PythonMIT4005 32 97

Anima

33B Chinese LLM, DPO QLORA, 100K context, AirLLM 70B inference with single 4GB GPU

Language:Jupyter NotebookApache-2.03347 97 129

sglang

SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable.

Language:PythonApache-2.02503 30 228

blitzar

Zero-knowledge proof acceleration with GPUs for C++ and Rust

Language:C++Apache-2.02274 44 2

DeepSpeed-MII

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.

Language:PythonApache-2.01692 40 273

executorch

On-device AI across mobile, embedded and edge for PyTorch

Language:C++NOASSERTION1334 52 249

Lumos

A RAG LLM co-pilot for browsing the web, powered by local LLMs

Language:TypeScriptMIT1275 9 93

splink

Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends

Language:PythonMIT1115 17 630

sneller

World's fastest log analysis: λ + SQL + JSON + S3

Language:GoNOASSERTION974 22 7

gluten

Gluten: Plugin to Double SparkSQL's Performance

Language:ScalaApache-2.0920 31 1303

llm-app-stack

855 21 4

yet-another-applied-llm-benchmark

A benchmark to evaluate language models on questions I've previously asked them to solve.

Language:PythonGPL-3.0782 15 7

incubator-xtable

Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.

Language:JavaApache-2.0702 26 194