Zhubo Shi (cerebellumking)

cerebellumking

Geek Repo

Company:Tongji University

Location:Shanghai,China

Github PK Tool:Github PK Tool

Zhubo Shi's starred repositories

the-art-of-command-line

Master the command line, in one page

Stargazers:153116Issues:0Issues:0

mac-setup

Installing Development environment on macOS

Language:ShellLicense:NOASSERTIONStargazers:7178Issues:0Issues:0

llm-action

本项目旨在分享大模型相关技术原理以及实战经验。

Language:HTMLLicense:Apache-2.0Stargazers:9410Issues:0Issues:0

speculative-decoding

Explorations into some recent techniques surrounding speculative decoding

Language:PythonLicense:MITStargazers:195Issues:0Issues:0

MediaCrawler

小红书笔记 | 评论爬虫、抖音视频 | 评论爬虫、快手视频 | 评论爬虫、B 站视频 | 评论爬虫、微博帖子 | 评论爬虫、百度贴吧帖子 | 百度贴吧评论回复爬虫 | 知乎问答文章|评论爬虫

Language:PythonLicense:NOASSERTIONStargazers:16805Issues:0Issues:0

lnav

Log file navigator

Language:C++License:BSD-2-ClauseStargazers:7828Issues:0Issues:0

LLMSpeculativeSampling

Fast inference from large lauguage models via speculative decoding

Language:PythonLicense:Apache-2.0Stargazers:528Issues:0Issues:0

Awesome-LLMs-on-device

Awesome LLMs on Device: A Comprehensive Survey

License:MITStargazers:775Issues:0Issues:0

lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Language:PythonLicense:Apache-2.0Stargazers:4334Issues:0Issues:0

FlexFlow

FlexFlow Serve: Low-Latency, High-Performance LLM Serving

Language:C++License:Apache-2.0Stargazers:1667Issues:0Issues:0

text-generation-inference

Large Language Model Text Generation Inference

Language:PythonLicense:Apache-2.0Stargazers:8855Issues:0Issues:0
Language:ShellStargazers:9Issues:0Issues:0

BigLittleDecoder

[NeurIPS'23] Speculative Decoding with Big Little Decoder

Language:PythonLicense:Apache-2.0Stargazers:85Issues:0Issues:0

Medusa

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:2231Issues:0Issues:0

REST

REST: Retrieval-Based Speculative Decoding, NAACL 2024

Language:CLicense:Apache-2.0Stargazers:163Issues:0Issues:0

LookaheadDecoding

[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding

Language:PythonLicense:Apache-2.0Stargazers:1111Issues:0Issues:0

Spec-Bench

Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)

Language:PythonLicense:Apache-2.0Stargazers:166Issues:0Issues:0

SpeculativeDecodingPapers

📰 Must-read papers and blogs on Speculative Decoding ⚡️

License:Apache-2.0Stargazers:377Issues:0Issues:0
Language:C++License:MITStargazers:11Issues:0Issues:0
Language:PythonLicense:BSD-3-ClauseStargazers:67Issues:0Issues:0

flute

Fast Matrix Multiplications for Lookup Table-Quantized LLMs

Language:CudaLicense:Apache-2.0Stargazers:168Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:40Issues:0Issues:0

prompt-cache

Modular and structured prompt caching for low-latency LLM inference

Language:PythonLicense:MITStargazers:48Issues:0Issues:0

llumnix

Efficient and easy multi-instance LLM serving

Language:PythonLicense:Apache-2.0Stargazers:137Issues:0Issues:0

inference

Reference implementations of MLPerf™ inference benchmarks

Language:PythonLicense:Apache-2.0Stargazers:1197Issues:0Issues:0

gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Language:PythonLicense:BSD-3-ClauseStargazers:5572Issues:0Issues:0

Awesome-LLM-Inference

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

License:GPL-3.0Stargazers:2580Issues:0Issues:0

AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Language:PythonLicense:MITStargazers:4374Issues:0Issues:0

marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Language:PythonLicense:Apache-2.0Stargazers:574Issues:0Issues:0

QQQ

QQQ is an innovative and hardware-optimized W4A8 quantization solution for LLMs.

Language:PythonStargazers:64Issues:0Issues:0