yingfeng

Yingfeng's starred repositories

PowerInfer

High-speed Large Language Model Serving on PCs with Consumer-grade GPUs

Language:C++MIT7673 75 151

Anima

33B Chinese LLM, DPO QLORA, 100K context, AirLLM 70B inference with single 4GB GPU

Language:Jupyter NotebookApache-2.03498 98 141

blazingmq

A modern high-performance open source message queuing system

Language:C++Apache-2.02498 27 57

infinity

The AI-native database built for LLM applications, providing incredibly fast hybrid search of dense embedding, sparse embedding, tensor, and full-text

Language:C++Apache-2.02124 25 300

Awesome-LLM-Inference

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

GPL-3.02031 79 4

LLMCompiler

[ICML 2024] LLMCompiler: An LLM Compiler for Parallel Function Calling

Language:PythonMIT1274 18 5

flashinfer

FlashInfer: Kernel Library for LLM Serving

Language:CudaApache-2.0825 13 75

SqueezeLLM

[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization

Language:PythonMIT603 17 27

FalkorDB

A super fast Graph Database uses GraphBLAS under the hood for its sparse adjacency matrix graph representation. Our goal is to provide the best Knowledge Graph for LLM (GraphRAG).

Language:CNOASSERTION525 15 472

SwiftInfer

Efficient AI Inference & Serving

Language:PythonApache-2.0447 5 6

H2O

[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.

Language:Python327 5 33

neural-speed

An innovative library for efficient LLM inference via low-bit quantization

Language:C++Apache-2.0320 8 46

KVQuant

KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

Language:Python243 14 12

concurrent_deferred_rc

Concurrent Deferred Reference Counting

Language:C++MIT145 19 4

relbench

RelBench: Relational Deep Learning Benchmark

Language:PythonMIT135 11 33

ServerlessLLM

Fast, easy and cost-efficient multi-LLM serving.

Language:PythonApache-2.0103 13 6

SpotServe

SpotServe: Serving Generative Large Language Models on Preemptible Instances

Apache-2.076 2 1

MSVBASE is a system that efficiently supports complex queries of both approximate similarity search and relational operators. It integrates high-dimensional vector indices into PostgreSQL, a relational database to facilitate complex approximate similarity queries.

Language:C++MIT66 7 8

yingfeng

Yingfeng's starred repositories

PowerInfer

Anima

blazingmq

infinity

Awesome-LLM-Inference

LLMCompiler

flashinfer

SqueezeLLM

Qwen-TensorRT-LLM

FalkorDB

SwiftInfer

H2O

neural-speed

xFasterTransformer

PainlessInferenceAcceleration

KVQuant

concurrent_deferred_rc

relbench

ServerlessLLM

SpotServe

MSVBASE

BEBR

tunnel

fast-multi-join-sketch

constrainedANN

bqf

exaloglog-paper

tabby-dbos

verlib

Hetu