There are 5 repositories under post-training-quantization topic.
micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、regular and group convolutional channel pruning; 3、 group convolution structure; 4、batch-normalization fuse for quantization. deploy: tensorrt, fp32/fp16/int8(ptq-calibration)、op-adapt(upsample)、dynamic_shape
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
TinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework.
[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization
A model compression and acceleration toolbox based on pytorch.
[ICCV 2023] Q-Diffusion: Quantizing Diffusion Models.
[IJCAI 2022] FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer
This repository contains notebooks that show the usage of TensorFlow Lite for quantizing deep neural networks.
This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit"
[CVPR 2024 Highlight] This is the official PyTorch implementation of "TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models".
Notes on quantization in neural networks
Post-training static quantization using ResNet18 architecture
[ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models"
Pytorch implementation of our paper accepted by ECCV 2022-- Fine-grained Data Distribution Alignment for Post-Training Quantization
Improved the performance of 8-bit PTQ4DM expecially on FID.
[CAAI AIR'24] Minimize Quantization Output Error with Bias Compensation
This sample shows how to convert TensorFlow model to OpenVINO IR model and how to quantize OpenVINO model.
Post-training quantization on Nvidia Nemo ASR model
Implementation of EPTQ - an Enhanced Post-Training Quantization algorithm for DNN compression
quantization example for pqt & qat
Model Quantization with Pytorch, Tensorflow & Larq
Generating tensorrt model using onnx
Low-bit (2/4/8/16) Post Training Quantization for ResNet20
The repository discusses a research work published on MDPI Sensors and provides details about the project.
Post post-training-quantization (PTQ) method for improving LLMs. Unofficial implementation of https://arxiv.org/abs/2309.02784
Comprehensive study on the quantization of various CNN models, employing techniques such as Post-Training Quantization and Quantization Aware Training (QAT).
EfficientNetV2 (Efficientnetv2-b2) and quantization int8 and fp32 (QAT and PTQ) on CK+ dataset . fine-tuning, augmentation, solving imbalanced dataset, etc.
Quantization for Object Detection in Tensorflow 2.x
Research experiments archive for post-training quantization with TensorRT. Submitted and Accepted to IEEE EDGE 2024
A framework to train a ResUNet architecture, quantize, compile and execute it on an FPGA.
Post-Training quantization perfomed on the model trained with CLIC dataset.