Yingfeng (yingfeng)

yingfeng

Geek Repo

Location:China

Github PK Tool:Github PK Tool


Organizations
deepfabric
deepinsight
infiniflow
izenecloud

Yingfeng's starred repositories

PowerInfer

High-speed Large Language Model Serving on PCs with Consumer-grade GPUs

Language:C++License:MITStargazers:7673Issues:75Issues:151

Anima

33B Chinese LLM, DPO QLORA, 100K context, AirLLM 70B inference with single 4GB GPU

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:3498Issues:98Issues:141

blazingmq

A modern high-performance open source message queuing system

Language:C++License:Apache-2.0Stargazers:2498Issues:27Issues:57

infinity

The AI-native database built for LLM applications, providing incredibly fast hybrid search of dense embedding, sparse embedding, tensor, and full-text

Language:C++License:Apache-2.0Stargazers:2124Issues:25Issues:300

Awesome-LLM-Inference

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

LLMCompiler

[ICML 2024] LLMCompiler: An LLM Compiler for Parallel Function Calling

Language:PythonLicense:MITStargazers:1274Issues:18Issues:5

flashinfer

FlashInfer: Kernel Library for LLM Serving

Language:CudaLicense:Apache-2.0Stargazers:825Issues:13Issues:75

SqueezeLLM

[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization

Language:PythonLicense:MITStargazers:603Issues:17Issues:27

FalkorDB

A super fast Graph Database uses GraphBLAS under the hood for its sparse adjacency matrix graph representation. Our goal is to provide the best Knowledge Graph for LLM (GraphRAG).

Language:CLicense:NOASSERTIONStargazers:525Issues:15Issues:472

SwiftInfer

Efficient AI Inference & Serving

Language:PythonLicense:Apache-2.0Stargazers:447Issues:5Issues:6

H2O

[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.

neural-speed

An innovative library for efficient LLM inference via low-bit quantization

Language:C++License:Apache-2.0Stargazers:320Issues:8Issues:46

KVQuant

KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

concurrent_deferred_rc

Concurrent Deferred Reference Counting

Language:C++License:MITStargazers:145Issues:19Issues:4

relbench

RelBench: Relational Deep Learning Benchmark

Language:PythonLicense:MITStargazers:135Issues:11Issues:33

ServerlessLLM

Fast, easy and cost-efficient multi-LLM serving.

Language:PythonLicense:Apache-2.0Stargazers:103Issues:13Issues:6

SpotServe

SpotServe: Serving Generative Large Language Models on Preemptible Instances

MSVBASE

MSVBASE is a system that efficiently supports complex queries of both approximate similarity search and relational operators. It integrates high-dimensional vector indices into PostgreSQL, a relational database to facilitate complex approximate similarity queries.

Language:C++License:MITStargazers:66Issues:7Issues:8

BEBR

Official code for "Binary embedding based retrieval at Tencent"

Language:PythonLicense:Apache-2.0Stargazers:41Issues:4Issues:3

tunnel

Tunnel is a Pipeline Execution Engine based on C++20 coroutine

Language:C++License:Apache-2.0Stargazers:27Issues:1Issues:0

fast-multi-join-sketch

Fast Cardinality Estimation of Multi-Join Queries Using Sketches

Language:PythonLicense:NOASSERTIONStargazers:11Issues:0Issues:0
Language:C++License:Apache-2.0Stargazers:9Issues:3Issues:3

bqf

Implementation of a Backpack Quotient Filter

Language:C++License:AGPL-3.0Stargazers:9Issues:0Issues:0

exaloglog-paper

ExaLogLog: Space-Efficient and Practical Approximate Distinct Counting up to the Exa-Scale

Language:JavaStargazers:8Issues:4Issues:0
Language:C++License:MITStargazers:6Issues:1Issues:0
Language:C++License:MITStargazers:3Issues:10Issues:0

Hetu

A high-performance distributed deep learning system targeting large-scale and automated distributed training.

Language:PythonLicense:Apache-2.0Stargazers:2Issues:0Issues:0