yiliu30

Yi Liu's repositories

ao

The torchao repository contains api's and workflows for quantization and pruning gpu models.

Language:PythonBSD-3-Clause000

DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Language:PythonApache-2.0000

Intel® Neural Compressor (formerly known as Intel® Low Precision Optimization Tool), targeting to provide unified APIs for network compression technologies, such as low precision quantization, sparsity, pruning, knowledge distillation, across different deep learning frameworks to pursue optimal inference performance.

Language:PythonApache-2.0000

oneDNN

oneAPI Deep Neural Network Library (oneDNN)

Language:C++Apache-2.0000

pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Language:PythonNOASSERTION000

accelerate

🚀 A simple way to train and use PyTorch models with multi-GPU, TPU, mixed-precision

Apache-2.0000

ai-pr-reviewer

AI-based Pull Request Summarizer and Reviewer with Chat Capabilities.

Language:TypeScriptMIT000

auto-round

SOTA Weight-only Quantization Algorithm for LLMs

Language:PythonApache-2.0000

awesome-model-quantization

A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.

000

CodeXGLUE

Language:C#MIT000

gemma.cpp

lightweight, standalone C++ inference engine for Google's Gemma models.

Apache-2.0000

gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Language:PythonBSD-3-Clause000

hqq

Official implementation of Half-Quadratic Quantization (HQQ)

Language:PythonApache-2.0000

intel-extension-for-transformers

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡

Language:C++Apache-2.0000

ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, Axolotl, etc.

Apache-2.0000