galeselee

Zeyu Li's repositories

transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Apache-2.0000

Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on accelerating LLMs, currently focusing mainly on inference acceleration, and related works will be gradually added in the future. Welcome contributions!

15700

llama

Inference code for LLaMA models

Language:PythonNOASSERTION000

DVFS_PaperList

Energy is a very noticable topic. Dynaimc Voltage and Frequency Scaling is a technique for CPU and GPU power consumption. Here is a paperlist of DVFS and power consumption.

100

llama-models

Utilities intended for use with Llama models.

NOASSERTION000

galeselee.github.io

Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes

Language:JavaScriptMIT000

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonApache-2.0000

tinyllm

FlexLLM is a flexsible and tiny LLM Serving framework. And it is a personal customization from lightllm

Language:Python000

VPTQ

VPTQ, A Flexible and Extreme low-bit quantization algorithm

MIT000

OS-Paper-List

000

lightllm

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Language:PythonApache-2.0000

MICS6000W

Language:C000

sarathi-serve

A low-latency & high-throughput serving engine for LLMs

Language:PythonApache-2.0000

LLM-Viewer

Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.

MIT000