neural-network compression acceleration tensor-decomposition pruning architecture-search knowledge-distillation sparsification low-rank

Model Compression and Acceleration Progress

Repository to track the progress in model compression and acceleration

Low-rank approximation

T-Net: Parametrizing Fully Convolutional Nets with a Single High-Order Tensor (CVPR 2019) paper
MUSCO: Multi-Stage COmpression of neural networks (ICCVW 2019) paper | code (PyTorch)
Efficient Neural Network Compression (CVPR 2019) paper | code (Caffe)
Adaptive Mixture of Low-Rank Factorizations for Compact Neural Modeling (ICLR 2019) paper | code (PyTorch)
Extreme Network Compression via Filter Group Approximation (ECCV 2018) paper
Ultimate tensorization: compressing convolutional and FC layers alike (NIPS 2016 workshop) paper | code (TensorFlow) | code (MATLAB, Theano + Lasagne)
Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications (ICLR 2016) paper
Accelerating Very Deep Convolutional Networks for Classification and Detection (IEEE TPAMI 2016) paper
Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition (ICLR 2015) paper | code (Caffe)
Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation (NIPS 2014) paper
Speeding up Convolutional Neural Networks with Low Rank Expansions (2014) paper

Pruning & Sparsification

Papers

Rethinking the Value of Network Pruning (ICLR 2019, NIPS 2018 workshop) paper | code (PyTorch)
Dynamic Channel Pruning: Feature Boosting and Suppression (ICLR 2019) paper | code
AutoPruner: An End-to-End Trainable Filter Pruning Method for Efficient Deep Model Inference (2019) paper
CLIP-Q: Deep Network Compression Learning by In-ParallelPruning-Quantization (CVPR 2018) paper
Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks (IJCAI 2018) paper | code and models (PyTorch)
Discrimination-aware Channel Pruning for Deep Neural Networks (NIPS 2018) paper | code and pretrained models (PyTorch)
AMC: AutoML for Model Compression and Acceleration on Mobile Devices (ECCV18) paper | code (PyTorch) | pretrained models (PyTorch, TensorFlow, TensorFlow Light)
Channel Gating Neural Networks (2018) paper
DSD: Dense-Sparse-Dense Training for Deep Neural Networks paper | pretrained models (Caffe) (ICLR 2017)
Channel Pruning for Accelerating Very Deep Neural Networks (ICCV 2017) paper | code and pretrained models (Caffe) | code (PyTorch)
Learning Efficient Convolutional Networks through Network Slimming (ICCV 2017) paper | code (Torch, Pytorch)
ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression (ICCV 2017) paper | pretrained model (Caffe) | code (PyTorch)
Structured Bayesian Pruning via Log-Normal Multiplicative Noise (NIPS 2017) paper | code (TensorFlow, Theano + Lasagne)
SphereFace: Deep Hypersphere Embedding for Face Recognition (CVPR 2017) paper | code and pretrained models (Caffe)
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding (ICLR 2016) paper
Fast ConvNets Using Group-wise Brain Damage (CVPR 2016) paper

Repos

Pruning + quantization code and pretrained models (TensorFlow, TensorFlow light). Examples for CIFAR.

Knowledge distillation

Papers

Learning Efficient Detector with Semi-supervised Adaptive Distillation (arxiv 2019) paper | code (Caffe)
Model compression via distillation and quantization (ICLR 2018) paper | code (Pytorch)
Training Shallow and Thin Networks for Acceleration via Knowledge Distillation with Conditional Adversarial Networks (ICLR 2018 workshop) paper
Training Shallow and Thin Networks for Acceleration via Knowledge Distillation with Conditional Adversarial Networks ( BMVC 2018) paper
Net2Net: Accelerating Learning via Knowledge Transfer (ICLR 2016) paper
Distilling the Knowledge in a Neural Network (NIPS 2014) paper
FitNets: Hints for Thin Deep Nets (2014) paper | code (Theano + Pylearn2)

Repos

TensorFlow implementation of three papers https://github.com/chengshengchan/model_compression, results for CIFAR-10

Quantization

Bayesian Bits: Unifying Quantization and Pruning (2020) paper
Up or Down? Adaptive Rounding for Post-Training Quantization (2020) paper
Gradient $\ell_1$ Regularization for Quantization Robustness (ICLR 2020) paper
Training Binary Neural Networks with Real-to-Binary Convolutions (ICLR 2020) paper | code (coming soon)
Data-Free Quantization Through Weight Equalization and Bias Correction (ICCV 2019) paper | code (PyTorch)
XNOR-Net++ (2019) paper
Matrix and tensor decompositions for training binary neural networks (2019) paper
XNOR-Net (ECCV 2016) paper | code (Pytorch)
Trained Quantization Thresholds for Accurate and Efficient Fixed-Point Inference of Deep Neural Networks (2019) paper | code (TensorFlow)
Relaxed Quantization for Discretized Neural Networks (ICLR 2019) paper
Training and Inference with Integers in Deep Neural Networks (ICLR 2018) paper | code (TensorFlow)
Training Quantized Nets: A Deeper Understanding (NeurIPS 2017) paper
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference (2017) paper
Deep Learning with Limited Numerical Precision (2015) paper
Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation (2013) paper

Architecture search

MobileNets
- Searching for MobileNetV3 paper
- MobileNetV2: Inverted Residuals and Linear Bottlenecks (CVPR 2018) paper | code and pretrained models (TensorFlow)
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks (ICML 2019) paper | code and pretrained models (TensorFlow)
MnasNet: Platform-Aware Neural Architecture Search for Mobile (CVPR 2019) paper | code (TensorFlow)
MorphNet: Fast & Simple Resource-Constrained Learning of Deep Network Structure (CVPR 2018) paper | code (TensorFlow)
ShuffleNets
- ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design (ECCV 2018) paper
- ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices (CVPR 2018) paper
Multi-Fiber Networks for Video Recognition (ECCV 2018) paper | code (PyTorch)
IGCVs
- IGCV3: Interleaved Low-Rank Group Convolutions for Efficient Deep Neural Networks (BMVC 2018) paper | code and pretrained models (MXNet)
- IGCV2: Interleaved Structured Sparse Convolutional Neural Networks (CVPR 2018) paper
- Interleaved Group Convolutions for Deep Neural Networks (ICCV 2017) paper

PhD thesis and overviews

Quantizing deep convolutional networks for efficient inference: A whitepaper (2018) paper
Algorithms for speeding up convolutional neural networks (2018) thesis
Model Compression and Acceleration for Deep Neural Networks: The Principles, Progress, and Challenges (2018) paper
Efficient methods and hardware for deep learning (2017) thesis

Frameworks

MUSCO - framework for model compression using tensor decompositions (PyTorch, TensorFlow)
AIMET - AI Model Efficiency Toolkit (PyTorch, Tensorflow)
Distiller - package for compression using pruning and low-precision arithmetic (PyTorch)
MorphNet - framework for neural networks architecture learning (TensorFlow)
Mayo - deep learning framework with fine- and coarse-grained pruning, network slimming, and quantization methods
PocketFlow - framework for model pruning, sparcification, quantization (TensorFlow implementation)
Keras compressor - compression using low-rank approximations, SVD for matrices, Tucker for tensors.
Caffe compressor K-means based quantization
gemmlowp - Building a quantization paradigm from first principles (C++)
NNI - Framework for Feature Engineering, NAS, Hyperparam tuning and Model compression

Comparison of different approaches

Please, see comparative_results.pdf

Similar repos

About

Repository to track the progress in model compression and acceleration

neural-network compression acceleration tensor-decomposition pruning architecture-search knowledge-distillation sparsification low-rank