Zeyu Li's repositories
transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Awesome_LLM_System-PaperList
Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on accelerating LLMs, currently focusing mainly on inference acceleration, and related works will be gradually added in the future. Welcome contributions!
llama
Inference code for LLaMA models
DVFS_PaperList
Energy is a very noticable topic. Dynaimc Voltage and Frequency Scaling is a technique for CPU and GPU power consumption. Here is a paperlist of DVFS and power consumption.
llama-models
Utilities intended for use with Llama models.
galeselee.github.io
Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
tinyllm
FlexLLM is a flexsible and tiny LLM Serving framework. And it is a personal customization from lightllm
VPTQ
VPTQ, A Flexible and Extreme low-bit quantization algorithm
lightllm
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
sarathi-serve
A low-latency & high-throughput serving engine for LLMs
LLM-Viewer
Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.
llama3
The official Meta Llama 3 GitHub site
flash-attention
Fast and memory-efficient exact attention
perf-book
The book "Performance Analysis and Tuning on Modern CPU"
galeselee
The description card
llm.c
LLM training in simple, raw C/CUDA
cutlass
CUDA Templates for Linear Algebra Subroutines
flash-attention-minimal
Flash Attention in ~100 lines of CUDA (forward pass only)
gptq
Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
apex
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
CutlassHelloWorld
This is a repo for Cutlass learning.
DiT
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
DALLE-pytorch
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch