There are 4 repositories under post-training-quantization topic.
micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、regular and group convolutional channel pruning; 3、 group convolution structure; 4、batch-normalization fuse for quantization. deploy: tensorrt, fp32/fp16/int8(ptq-calibration)、op-adapt(upsample)、dynamic_shape
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
TinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework.
SqueezeLLM: Dense-and-Sparse Quantization
A model compression and acceleration toolbox based on pytorch.
[IJCAI 2022] FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer
[ICCV 2023] Q-Diffusion: Quantizing Diffusion Models.
This repository contains notebooks that show the usage of TensorFlow Lite for quantizing deep neural networks.
Post-training static quantization using ResNet18 architecture
Notes on quantization in neural networks
Improved the performance of 8-bit PTQ4DM expecially on FID.
This sample shows how to convert TensorFlow model to OpenVINO IR model and how to quantize OpenVINO model.
Post-training quantization on Nvidia Nemo ASR model
Implementation of EPTQ - an Enhanced Post-Training Quantization algorithm for DNN compression
quantization example for pqt & qat
Generating tensorrt model using onnx
The repository discusses a research work published on MDPI Sensors and provides details about the project.
Model Quantization with Pytorch, Tensorflow & Larq
Low-bit (2/4/8/16) Post Training Quantization for ResNet20
Quantization for Object Detection in Tensorflow 2.x
Comprehensive study on the quantization of various CNN models, employing techniques such as Post-Training Quantization and Quantization Aware Training (QAT).
A framework to train a ResUNet architecture, quantize, compile and execute it on an FPGA.
EfficientNetV2 (Efficientnetv2-b2) and quantization int8 and fp32 (QAT and PTQ) on CK+ dataset . fine-tuning, augmentation, solving imbalanced dataset, etc.
Post post-training-quantization (PTQ) method for improving LLMs. Unofficial implementation of https://arxiv.org/abs/2309.02784
Post-Training quantization perfomed on the model trained with CLIC dataset.