KevinZeng08

Kevinzz's starred repositories

piperag

PipeRAG: Fast Retrieval-Augmented Generation via Algorithm-System Co-design

Language:PythonApache-2.0600

xDiT

A Scalable Inference Engine for Diffusion Transformers (DiTs) on multi-GPU Clusters

Language:PythonApache-2.013100

mem0

The memory layer for Personalized AI

Language:Python1769200

llm-action

本项目旨在分享大模型相关技术原理以及实战经验。

Language:HTMLApache-2.0811400

flashinfer

FlashInfer: Kernel Library for LLM Serving

Language:CudaApache-2.085600

MSVBASE is a system that efficiently supports complex queries of both approximate similarity search and relational operators. It integrates high-dimensional vector indices into PostgreSQL, a relational database to facilitate complex approximate similarity queries.

Language:C++MIT6900

radient

Radient turns many data types (not just text) into vectors for similarity search, RAG, regression analysis, and more.

Language:PythonBSD-2-Clause24000

GraphRAG_Lite

Language:Jupyter NotebookApache-2.0900

awesome-generative-information-retrieval

56600

FastV

[ECCV 2024] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

Language:Python17600

curator

Language:PythonNOASSERTION700

cuCollections

Language:C++Apache-2.045500

PGRAG

Language:PythonNOASSERTION3200

DB-GPT

AI Native Data App Development framework with AWEL(Agentic Workflow Expression Language) and Agents

Language:PythonMIT1268700

MInference

To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.

Language:PythonMIT58400

ragflow

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.

Language:PythonApache-2.01264800

prompt-lookup-decoding

Language:Jupyter Notebook42100

ESPN-v1

ESPN: Embedding from Storage Pipelined Network. GDS implementation for multi-vector embedding retrieval and bindings.

Language:C++MIT1000

LOOK-M

Official implementation of "LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference"

Language:PythonMIT4700

LLM-Viewer

Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.

Language:PythonMIT23800

flux

A fast communication-overlapping library for tensor parallelism on GPUs.

Language:C++Apache-2.09800

Efficient-Multimodal-LLMs-Survey

Efficient Multimodal Large Language Models: A Survey

Apache-2.019200

Awesome_LLM_System-PaperList

Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on accelerating LLMs, currently focusing mainly on inference acceleration, and related works will be gradually added in the future. Welcome contributions!

11700