Kevinzz (KevinZeng08)

KevinZeng08

Geek Repo

Company:Zhejiang University

Location:China Mainland

Github PK Tool:Github PK Tool

Kevinzz's starred repositories

piperag

PipeRAG: Fast Retrieval-Augmented Generation via Algorithm-System Co-design

Language:PythonLicense:Apache-2.0Stargazers:6Issues:0Issues:0

xDiT

A Scalable Inference Engine for Diffusion Transformers (DiTs) on multi-GPU Clusters

Language:PythonLicense:Apache-2.0Stargazers:131Issues:0Issues:0

mem0

The memory layer for Personalized AI

Language:PythonStargazers:17692Issues:0Issues:0

llm-action

本项目旨在分享大模型相关技术原理以及实战经验。

Language:HTMLLicense:Apache-2.0Stargazers:8114Issues:0Issues:0

flashinfer

FlashInfer: Kernel Library for LLM Serving

Language:CudaLicense:Apache-2.0Stargazers:856Issues:0Issues:0

MSVBASE

MSVBASE is a system that efficiently supports complex queries of both approximate similarity search and relational operators. It integrates high-dimensional vector indices into PostgreSQL, a relational database to facilitate complex approximate similarity queries.

Language:C++License:MITStargazers:69Issues:0Issues:0

radient

Radient turns many data types (not just text) into vectors for similarity search, RAG, regression analysis, and more.

Language:PythonLicense:BSD-2-ClauseStargazers:240Issues:0Issues:0
Language:Jupyter NotebookLicense:Apache-2.0Stargazers:9Issues:0Issues:0

FastV

[ECCV 2024] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

Language:PythonStargazers:176Issues:0Issues:0
Language:PythonLicense:NOASSERTIONStargazers:7Issues:0Issues:0
Language:C++License:Apache-2.0Stargazers:455Issues:0Issues:0

PGRAG

PGRAG

Language:PythonLicense:NOASSERTIONStargazers:32Issues:0Issues:0

DB-GPT

AI Native Data App Development framework with AWEL(Agentic Workflow Expression Language) and Agents

Language:PythonLicense:MITStargazers:12687Issues:0Issues:0

MInference

To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.

Language:PythonLicense:MITStargazers:584Issues:0Issues:0

ragflow

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.

Language:PythonLicense:Apache-2.0Stargazers:12648Issues:0Issues:0
Language:Jupyter NotebookStargazers:421Issues:0Issues:0

ESPN-v1

ESPN: Embedding from Storage Pipelined Network. GDS implementation for multi-vector embedding retrieval and bindings.

Language:C++License:MITStargazers:10Issues:0Issues:0

LOOK-M

Official implementation of "LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference"

Language:PythonLicense:MITStargazers:47Issues:0Issues:0

LLM-Viewer

Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.

Language:PythonLicense:MITStargazers:238Issues:0Issues:0

flux

A fast communication-overlapping library for tensor parallelism on GPUs.

Language:C++License:Apache-2.0Stargazers:98Issues:0Issues:0

Efficient-Multimodal-LLMs-Survey

Efficient Multimodal Large Language Models: A Survey

License:Apache-2.0Stargazers:192Issues:0Issues:0

Awesome_LLM_System-PaperList

Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on accelerating LLMs, currently focusing mainly on inference acceleration, and related works will be gradually added in the future. Welcome contributions!

Stargazers:117Issues:0Issues:0

Awesome-LLM-Long-Context-Modeling

📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥

License:MITStargazers:654Issues:0Issues:0

llm-inference-benchmark

LLM Inference benchmark

Language:PythonLicense:MITStargazers:299Issues:0Issues:0

SpeculativeDecodingPapers

📰 Must-read papers and blogs on Speculative Decoding ⚡️

License:Apache-2.0Stargazers:283Issues:0Issues:0

lectures

Material for cuda-mode lectures

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:1952Issues:0Issues:0

LLM101n

LLM101n: Let's build a Storyteller

Stargazers:25624Issues:0Issues:0

Awesome-Efficient-LLM

A curated list for Efficient Large Language Models

Language:PythonStargazers:992Issues:0Issues:0