hanrui1sensetime

hanrui1sensetime's starred repositories

Medusa

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Language:Jupyter NotebookApache-2.0194500

This is the official PyTorch implementation of "LLM-QBench: A Benchmark Towards the Best Practice for Post-training Quantization of Large Language Models", and also an efficient LLM compression tool with various advanced compression methods, supporting multiple inference backends.

Language:PythonApache-2.010400

llm_interview_note

主要记录大语言大模型（LLMs）算法（应用）工程师相关的知识及面试题

Language:HTML101400

qserve

QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving

Language:PythonApache-2.027800

Atom

[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

Language:Cuda19500

GPTQ-for-PULSE

4 bits quantization of PULSE models using GPTQ

Language:PythonApache-2.0200

QUIK

Repository for the QUIK project, enabling the use of 4bit kernels for generative inference

Language:C++Apache-2.015900

mamba

Mamba SSM architecture

Language:PythonApache-2.01094000

AutoAWQ

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:

Language:PythonMIT133600

RETFound_MAE

RETFound - A foundation model for retinal image

Language:Jupyter NotebookNOASSERTION6000

RETFound_MAE

Language:PythonNOASSERTION1800

lightllm

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Language:PythonApache-2.0194000