There are 16 repositories under quantization topic.
Neural Network Distiller by Intel AI Lab: a Python package for neural network compression research. https://intellabs.github.io/distiller
A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks
Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.
Base pretrained models and datasets in pytorch (MNIST, SVHN, CIFAR10, CIFAR100, STL10, AlexNet, VGG16, VGG19, ResNet, Inception, SqueezeNet)
micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、regular and group convolutional channel pruning; 3、 group convolution structure; 4、batch-normalization fuse for quantization. deploy: tensorrt, fp32/fp16/int8(ptq-calibration)、op-adapt(upsample)、dynamic_shape
A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.
AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
PaddleSlim is an open-source library for deep model compression and architecture search.
Inference runtime offering GPU-class performance on CPUs and APIs to integrate ML into your application
Trainable models and NN optimization tools
🏎️ Accelerate training and inference of 🤗 Transformers with easy to use hardware optimization tools
Intel® Neural Compressor (formerly known as Intel® Low Precision Optimization Tool), targeting to provide unified APIs for network compression technologies, such as low precision quantization, sparsity, pruning, knowledge distillation, across different deep learning frameworks to pursue optimal inference performance.
A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.
A list of high-quality (newest) AutoML works and lightweight models including 1.) Neural Architecture Search, 2.) Lightweight Structures, 3.) Model Compression, Quantization and Acceleration, 4.) Hyperparameter Optimization, 5.) Automated Feature Engineering.
PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.
Embedded and mobile deep learning research resources
A Python package for extending the official PyTorch that can easily obtain performance on Intel platform
Palette quantization library that powers pngquant and other PNG optimizers
Neural Network Compression Framework for enhanced OpenVINO™ inference
A Pytorch Knowledge Distillation library for benchmarking and extending works in the domains of Knowledge Distillation, Pruning, and Quantization.
[NeurIPS 2020] MCUNet: Tiny Deep Learning on IoT Devices; [NeurIPS 2021] MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning; [NeurIPS 2022] MCUNetV3: On-Device Training Under 256KB Memory
Fast inference engine for Transformer models
Tool for onnx->keras or onnx->tflite. If tool is useful for you, please star it.
Must-read papers on deep learning to hash (DeepHash)
Infrastructures™ for Machine Learning Training/Inference in Production.
[CVPR 2019, Oral] HAQ: Hardware-Aware Automated Quantization with Mixed Precision
Quantization library for PyTorch. Support low-precision and mixed-precision quantization, with hardware implementation through TVM.
Awesome machine learning model compression research papers, tools, and learning material.