intel / neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

Home Page:https://intel.github.io/neural-compressor/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

intel/neural-compressor Stargazers