YongHuaZhang-BUAA

SunshineZhang's repositories

accelerate

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support

Language:PythonApache-2.0000

Awesome-LLM-Compression

Awesome LLM compression research papers and tools.

MIT000

bitsandbytes

LLM：8-bit CUDA functions for PyTorch

Language:PythonMIT000

cuda_hgemm

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

Language:CudaMIT000

Source code for 'Data Parallel C++: Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL' by James Reinders, Ben Ashbaugh, James Brodman, Michael Kinsner, John Pennycook, Xinmin Tian (Apress, 2020).

NOASSERTION000

DeepSpeed

LLM：DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Language:PythonApache-2.0000

dpctl

Python SYCL bindings and SYCL-based Python Array API library

Apache-2.0000

FasterTransformer

Transformer related optimization, including BERT, GPT

Apache-2.0000

FlexGen

LLM：FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU

Language:PythonApache-2.0000

llama.cpp

LLM inference in C/C++

MIT000

llm-awq

LLM：AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Language:PythonMIT000

LLM-Pruner

[NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support LLaMA, Llama-2, BLOOM, Vicuna, Baichuan, etc.

Language:PythonApache-2.0000

LLMBox

大语言模型（2024人民大学版-配套代码资源）

MIT000

llvm

Intel staging area for llvm.org contribution. Home for Intel LLVM-based projects.

NOASSERTION000

lm-evaluation-harness

LLM：A framework for few-shot evaluation of autoregressive language models.

Language:PythonMIT000

LMOps

General technology for enabling AI capabilities w/ LLMs and MLLMs

Language:PythonMIT000

lut-gemm

Apache-2.0000

mlc-llm

Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.

Apache-2.0000

neural-compressor

LLM：Provide unified APIs for SOTA model compression techniques, such as low precision (INT8/INT4/FP4/NF4) quantization, sparsity, pruning, and knowledge distillation on mainstream AI frameworks such as TensorFlow, PyTorch, and ONNX Runtime.

Language:PythonApache-2.0000

pybind11

Seamless operability between C++11 and Python

NOASSERTION000

qlora

LLM：QLoRA: Efficient Finetuning of Quantized LLMs

Language:Jupyter NotebookMIT000

QUIK

Repository for the QUIK project, enabling the use of 4bit kernels for generative inference

Apache-2.0000

smoothquant

LLM：[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

Language:PythonMIT000

SpQR

LLM：SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression

Language:PythonApache-2.0000

tabby

A terminal for a more modern age

MIT000

TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Apache-2.0000

wanda

LLM Pruning：A simple and effective LLM pruning approach.

Language:PythonMIT000

YuLan-Chat

大语言模型（2024人民大学版-配套代码资源）

MIT000

YongHuaZhang-BUAA

SunshineZhang's repositories

accelerate

ANT-Quantization

Awesome-LLM-Compression

bitsandbytes

clash

cuda_hgemm

data-parallel-CPP

DeepSpeed

dpctl

FasterTransformer

FlexGen

llama.cpp

llm-awq

LLM-Pruner

LLMBox

llvm

lm-evaluation-harness

LMOps

lut-gemm

mlc-llm

neural-compressor

pybind11

qlora

QUIK

smoothquant

SpQR

tabby

TensorRT-LLM

wanda

YuLan-Chat