There are 0 repository under int4 topic.
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
row-major matmul optimization
An innovative library for efficient LLM inference via low-bit quantization
SOTA Weight-only Quantization Algorithm for LLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs"
Rust library to write integer types of any bit length into a byte buffer.