cerebellumking

followers

following

stars

Tongji University

Shanghai,China

Zhubo Shi's starred repositories

the-art-of-command-line

Master the command line, in one page

mac-setup

Installing Development environment on macOS

Language:ShellNOASSERTION717800

llm-action

本项目旨在分享大模型相关技术原理以及实战经验。

Language:HTMLApache-2.0941000

speculative-decoding

Explorations into some recent techniques surrounding speculative decoding

Language:PythonMIT19500

MediaCrawler

小红书笔记 | 评论爬虫、抖音视频 | 评论爬虫、快手视频 | 评论爬虫、B 站视频｜评论爬虫、微博帖子｜评论爬虫、百度贴吧帖子｜百度贴吧评论回复爬虫 | 知乎问答文章｜评论爬虫

Language:PythonNOASSERTION1680500

lnav

Log file navigator

Language:C++BSD-2-Clause782800

LLMSpeculativeSampling

Fast inference from large lauguage models via speculative decoding

Language:PythonApache-2.052800

Awesome-LLMs-on-device

Awesome LLMs on Device: A Comprehensive Survey

MIT77500

lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Language:PythonApache-2.0433400

FlexFlow

FlexFlow Serve: Low-Latency, High-Performance LLM Serving

Language:C++Apache-2.0166700

text-generation-inference

Large Language Model Text Generation Inference

Language:PythonApache-2.0885500

specinfer-ae

Language:Shell900

BigLittleDecoder

[NeurIPS'23] Speculative Decoding with Big Little Decoder

Language:PythonApache-2.08500

Medusa

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Language:Jupyter NotebookApache-2.0223100

REST

REST: Retrieval-Based Speculative Decoding, NAACL 2024

Language:CApache-2.016300

LookaheadDecoding

[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding

Language:PythonApache-2.0111100

Spec-Bench

Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)

Language:PythonApache-2.016600

SpeculativeDecodingPapers

📰 Must-read papers and blogs on Speculative Decoding ⚡️

Apache-2.037700

fastkron

Language:C++MIT1100

LLMCompass

Language:PythonBSD-3-Clause6700

flute

Fast Matrix Multiplications for Lookup Table-Quantized LLMs

Language:CudaApache-2.016800

keyformer-llm

Language:PythonApache-2.04000

prompt-cache

Modular and structured prompt caching for low-latency LLM inference

Language:PythonMIT4800

llumnix

Efficient and easy multi-instance LLM serving

Language:PythonApache-2.013700

inference

Reference implementations of MLPerf™ inference benchmarks

Language:PythonApache-2.0119700

gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Language:PythonBSD-3-Clause557200

Awesome-LLM-Inference

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

GPL-3.0258000

AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Language:PythonMIT437400

marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Language:PythonApache-2.057400

QQQ

QQQ is an innovative and hardware-optimized W4A8 quantization solution for LLMs.

Language:Python6400