SunshineZhang's repositories

accelerate

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

ANT-Quantization

LLM推理-OliVe

Stargazers:0Issues:0Issues:0

Awesome-LLM-Compression

Awesome LLM compression research papers and tools.

License:MITStargazers:0Issues:0Issues:0

bitsandbytes

LLM:8-bit CUDA functions for PyTorch

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

clash

A rule-based tunnel in Go.

License:GPL-3.0Stargazers:0Issues:0Issues:0

cuda_hgemm

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

Language:CudaLicense:MITStargazers:0Issues:0Issues:0

data-parallel-CPP

Source code for 'Data Parallel C++: Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL' by James Reinders, Ben Ashbaugh, James Brodman, Michael Kinsner, John Pennycook, Xinmin Tian (Apress, 2020).

License:NOASSERTIONStargazers:0Issues:0Issues:0

DeepSpeed

LLM:DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

dpctl

Python SYCL bindings and SYCL-based Python Array API library

License:Apache-2.0Stargazers:0Issues:0Issues:0

FasterTransformer

Transformer related optimization, including BERT, GPT

License:Apache-2.0Stargazers:0Issues:0Issues:0

FlexGen

LLM:FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

llama.cpp

LLM inference in C/C++

License:MITStargazers:0Issues:0Issues:0

llm-awq

LLM:AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

LLM-Pruner

[NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support LLaMA, Llama-2, BLOOM, Vicuna, Baichuan, etc.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

LLMBox

大语言模型(2024人民大学版-配套代码资源)

License:MITStargazers:0Issues:0Issues:0

llvm

Intel staging area for llvm.org contribution. Home for Intel LLVM-based projects.

License:NOASSERTIONStargazers:0Issues:0Issues:0

lm-evaluation-harness

LLM:A framework for few-shot evaluation of autoregressive language models.

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

LMOps

General technology for enabling AI capabilities w/ LLMs and MLLMs

Language:PythonLicense:MITStargazers:0Issues:0Issues:0
License:Apache-2.0Stargazers:0Issues:0Issues:0

mlc-llm

Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.

License:Apache-2.0Stargazers:0Issues:0Issues:0

neural-compressor

LLM:Provide unified APIs for SOTA model compression techniques, such as low precision (INT8/INT4/FP4/NF4) quantization, sparsity, pruning, and knowledge distillation on mainstream AI frameworks such as TensorFlow, PyTorch, and ONNX Runtime.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

pybind11

Seamless operability between C++11 and Python

License:NOASSERTIONStargazers:0Issues:0Issues:0

qlora

LLM:QLoRA: Efficient Finetuning of Quantized LLMs

Language:Jupyter NotebookLicense:MITStargazers:0Issues:0Issues:0

QUIK

Repository for the QUIK project, enabling the use of 4bit kernels for generative inference

License:Apache-2.0Stargazers:0Issues:0Issues:0

smoothquant

LLM:[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

SpQR

LLM:SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

tabby

A terminal for a more modern age

License:MITStargazers:0Issues:0Issues:0

TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

License:Apache-2.0Stargazers:0Issues:0Issues:0

wanda

LLM Pruning:A simple and effective LLM pruning approach.

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

YuLan-Chat

大语言模型(2024人民大学版-配套代码资源)

License:MITStargazers:0Issues:0Issues:0