hanrui1sensetime's starred repositories
llmc
This is the official PyTorch implementation of "LLM-QBench: A Benchmark Towards the Best Practice for Post-training Quantization of Large Language Models", and also an efficient LLM compression tool with various advanced compression methods, supporting multiple inference backends.
llm_interview_note
主要记录大语言大模型(LLMs) 算法(应用)工程师相关的知识及面试题
GPTQ-for-PULSE
4 bits quantization of PULSE models using GPTQ
RETFound_MAE
RETFound - A foundation model for retinal image
awesome-lm-system
Summary of system papers/frameworks/codes/tools on training or serving large model
text-generation-inference
Large Language Model Text Generation Inference
FlashAttention20Triton
Triton implementation of Flash Attention2.0
RPTQ-for-LLaMA
Efficient 3bit/4bit quantization of LLaMA models
Awesome-Efficient-LLM
A curated list for Efficient Large Language Models