TaoLbr1993

followers

following

stars

Alibaba Group

Taoshu's starred repositories

ShiArthur03

Language:MATLABGPL-3.01035600

KIVI

KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

Language:PythonMIT19100

Pytorch-XNOR-Net

XNOR-Net, with binary gemm and binary conv2d kernels, support both CPU and GPU.

Language:PythonBSD-3-Clause7800

binary-networks-pytorch

Binarize convolutional neural networks using pytorch :fire:

Language:PythonBSD-3-Clause13000

kernel_tuner

Kernel Tuner

Language:PythonApache-2.026600

NVIDIA_SGEMM_PRACTICE

Step-by-step optimization of CUDA SGEMM

Language:Cuda19900

XNOR-Net

ImageNet classification using binary Convolutional Neural Networks

Language:LuaNOASSERTION85700

BiLLM

(ICML 2024) BiLLM: Pushing the Limit of Post-Training Quantization for LLMs

Language:PythonMIT17100

marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Language:PythonApache-2.051800

graphlearn-for-pytorch

A GPU-accelerated graph learning library for PyTorch, facilitating the scaling of GNN training and inference.

Language:PythonApache-2.011100

plot_demo

论文里可以用到的实验图示例

Language:Python18400

neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

Language:PythonApache-2.0211600

graphcast

Language:PythonApache-2.0448200

ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc.

Language:PythonApache-2.0641600

AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Language:PythonMIT425900

transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Language:PythonApache-2.013084700

Edge-MoE

Edge-MoE: Memory-Efficient Multi-Task Vision Transformer Architecture with Task-level Sparsity via Mixture-of-Experts

Language:C++8000

LLMSpeculativeSampling

Fast inference from large lauguage models via speculative decoding

Language:PythonApache-2.046600

PygHO

A library for subgraph GNN based on pyg

Language:PythonMIT3700

DejaVu

Language:Python25600

FlexGen

Running large language models on a single GPU for throughput-oriented scenarios.

Language:PythonApache-2.0910900

sparsegpt

Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".

Language:PythonApache-2.067900

Megatron-LM

Ongoing research training transformer models at scale

Language:PythonNOASSERTION973000

metaseq

Repo for external large-scale work

Language:PythonMIT643900

tuning_playbook

A playbook for systematically maximizing the performance of deep learning models.

NOASSERTION2617000

vsc-cec-ide

一个插件，国产化你的VSCode，来源于CEC-IDE，有敏感词检测、防沉迷等功能。

Language:TypeScriptApache-2.077700

Book

llama

Inference code for Llama models

Language:PythonNOASSERTION5517900

pretrain-gnns

Strategies for Pre-training Graph Neural Networks

Language:PythonMIT95400

SUN

Understanding and Extending Subgraph GNNs by Rethinking their Symmetries (NeurIPS 2022 Oral)

Language:PythonMIT3900