EfficientDNNs

A collection of recent methods on DNN compression and acceleration. There are mainly 5 kinds of methods for efficient DNNs:

neural architecture re-design or search (NAS)
- maintain accuracy, less cost (e.g., #Params, #FLOPs, etc.): MobileNet, ShuffleNet etc.
- maintain cost, more accuracy: Inception, ResNeXt, Xception etc.
pruning (including structured and unstructured)
quantization
matrix/low-rank decomposition
knowledge distillation (KD)

Note, this repo is more about pruning (with lottery ticket hypothesis or LTH as a sub-topic), KD, and quantization. For other topics like NAS, see more comprehensive collections (## Related Repos and Websites) at the end of this file. Welcome to send a pull request if you'd like to add any pertinent papers.

Other repos:

LTH (lottery ticket hypothesis) and its broader version, pruning at initialization (PaI), now is at the frontier of network pruning. We single out the PaI papers to this repo. Welcome to check it out!
Awesome-Efficient-ViT for a curated list of efficient vision transformers.

About abbreviation: In the list below, o for oral, s for spotlight, b for best paper, w for workshop.

Surveys

1993-TNN-Pruning Algorithms -- A survey
2017-Proceedings of the IEEE-Efficient Processing of Deep Neural Networks: A Tutorial and Survey [2020 Book: Efficient Processing of Deep Neural Networks]
2017.12-A survey of FPGA-based neural network accelerator
2018-FITEE-Recent Advances in Efficient Computation of Deep Convolutional Neural Networks
2018-IEEE Signal Processing Magazine-Model compression and acceleration for deep neural networks: The principles, progress, and challenges. Arxiv extension
2018.8-A Survey on Methods and Theories of Quantized Neural Networks
2019-JMLR-Neural Architecture Search: A Survey
2020-MLSys-What is the state of neural network pruning
2019.02-The State of Sparsity in Deep Neural Networks
2021-TPAMI-Knowledge Distillation and Student-Teacher Learning for Visual Intelligence: A Review and New Outlooks
2021-IJCV-Knowledge Distillation: A Survey
2020-Proceedings of the IEEE-Model Compression and Hardware Acceleration for Neural Networks: A Comprehensive Survey
2020-Pattern Recognition-Binary neural networks: A survey
2021-TPDS-The Deep Learning Compiler: A Comprehensive Survey
2021-JMLR-Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks
2022-IJCAI-Recent Advances on Neural Network Pruning at Initialization
2021.6-Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better

Papers [Pruning and Quantization]

1980s,1990s

1988-NIPS-A back-propagation algorithm with optimal use of hidden units
1988-NIPS-Skeletonization: A Technique for Trimming the Fat from a Network via Relevance Assessment
1988-NIPS-What Size Net Gives Valid Generalization?
1989-NIPS-Dynamic Behavior of Constained Back-Propagation Networks
1988-NIPS-Comparing Biases for Minimal Network Construction with Back-Propagation
1989-NIPS-Optimal Brain Damage
1990-NN-A simple procedure for pruning back-propagation trained neural networks
1992-NIPS-Second order derivatives for network pruning: Optimal Brain Surgeon
1993-ICNN-Optimal Brain Surgeon and general network pruning

2000s

2001-JMLR-Sparse Bayesian learning and the relevance vector machine
2007-Book-The minimum description length principle

2011

2011-JMLR-Learning with Structured Sparsity
2011-NIPSw-Improving the speed of neural networks on CPUs

2013

2014

2014-BMVC-Speeding up convolutional neural networks with low rank expansions
2014-INTERSPEECH-1-Bit Stochastic Gradient Descent and its Application to Data-Parallel Distributed Training of Speech DNNs
2014-NIPS-Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation
2014-NIPS-Do deep neural nets really need to be deep
2014.12-Memory bounded deep convolutional networks

2015

2015-ICLR-Speeding-up convolutional neural networks using fine-tuned cp-decomposition
2015-ICML-Compressing neural networks with the hashing trick
2015-INTERSPEECH-A Diversity-Penalizing Ensemble Training Method for Deep Learning
2015-BMVC-Data-free parameter pruning for deep neural networks
2015-BMVC-Learning the structure of deep architectures using l1 regularization
2015-NIPS-Learning both Weights and Connections for Efficient Neural Network
2015-NIPS-Binaryconnect: Training deep neural networks with binary weights during propagations
2015-NIPS-Structured Transforms for Small-Footprint Deep Learning
2015-NIPS-Tensorizing Neural Networks
2015-NIPSw-Distilling Intractable Generative Models
2015-NIPSw-Federated Optimization:Distributed Optimization Beyond the Datacenter
2015-CVPR-Efficient and Accurate Approximations of Nonlinear Convolutional Networks [2016 TPAMI version: Accelerating Very Deep Convolutional Networks for Classification and Detection]
2015-CVPR-Sparse Convolutional Neural Networks
2015-ICCV-An Exploration of Parameter Redundancy in Deep Networks with Circulant Projections
2015.12-Exploiting Local Structures with the Kronecker Layer in Convolutional Networks

2016

2016-ICLR-Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding [Best paper!]
2016-ICLR-All you need is a good init [Code]
2016-ICLR-Data-dependent Initializations of Convolutional Neural Networks [Code]
2016-ICLR-Convolutional neural networks with low-rank regularization [Code]
2016-ICLR-Diversity networks
2016-ICLR-Neural networks with few multiplications
2016-ICLR-Compression of deep convolutional neural networks for fast and low power mobile applications
2016-ICLRw-Randomout: Using a convolutional gradient norm to win the filter lottery
2016-CVPR-Fast algorithms for convolutional neural networks
2016-CVPR-Fast ConvNets Using Group-wise Brain Damage
2016-BMVC-Learning neural network architectures using backpropagation
2016-ECCV-Less is more: Towards compact cnns
2016-EMNLP-Sequence-Level Knowledge Distillation
2016-NIPS-Learning Structured Sparsity in Deep Neural Networks [Caffe Code]
2016-NIPS-Dynamic Network Surgery for Efficient DNNs [Caffe Code]
2016-NIPS-Learning the Number of Neurons in Deep Neural Networks
2016-NIPS-Memory-Efficient Backpropagation Through Time
2016-NIPS-PerforatedCNNs: Acceleration through Elimination of Redundant Convolutions
2016-NIPS-LightRNN: Memory and Computation-Efficient Recurrent Neural Networks
2016-NIPS-CNNpack: packing convolutional neural networks in the frequency domain
2016-ISCA-Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks
2016-ICASSP-Learning compact recurrent neural networks
2016-CoNLL-Compression of Neural Machine Translation Models via Pruning
2016.03-Adaptive Computation Time for Recurrent Neural Networks
2016.06-Structured Convolution Matrices for Energy-efficient Deep learning
2016.06-Deep neural networks are robust to weight binarization and other non-linear distortions
2016.06-Hypernetworks
2016.07-IHT-Training skinny deep neural networks with iterative hard thresholding methods
2016.08-Recurrent Neural Networks With Limited Numerical Precision
2016.10-Deep model compression: Distilling knowledge from noisy teachers
2016.10-Federated Optimization: Distributed Machine Learning for On-Device Intelligence
2016.11-Alternating Direction Method of Multipliers for Sparse Convolutional Neural Networks

2017

2017-ICLR-Pruning Filters for Efficient ConvNets [PyTorch Reimpl. #1] [PyTorch Reimpl. #2]
2017-ICLR-Pruning Convolutional Neural Networks for Resource Efficient Inference
2017-ICLR-Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights [Code]
2017-ICLR-Do Deep Convolutional Nets Really Need to be Deep and Convolutional?
2017-ICLR-DSD: Dense-Sparse-Dense Training for Deep Neural Networks
2017-ICLR-Faster CNNs with Direct Sparse Convolutions and Guided Pruning
2017-ICLR-Towards the Limit of Network Quantization
2017-ICLR-Loss-aware Binarization of Deep Networks
2017-ICLR-Trained Ternary Quantization [Code]
2017-ICLR-Exploring Sparsity in Recurrent Neural Networks
2017-ICLR-Soft Weight-Sharing for Neural Network Compression [Reddit discussion] [Code]
2017-ICLR-Variable Computation in Recurrent Neural Networks
2017-ICLR-Training Compressed Fully-Connected Networks with a Density-Diversity Penalty
2017-ICML-Theoretical Properties for Neural Networks with Weight Matrices of Low Displacement Rank
2017-ICML-Deep Tensor Convolution on Multicores
2017-ICML-Delta Networks for Optimized Recurrent Network Computation
2017-ICML-Beyond Filters: Compact Feature Map for Portable Deep Model
2017-ICML-Combined Group and Exclusive Sparsity for Deep Neural Networks
2017-ICML-MEC: Memory-efficient Convolution for Deep Neural Network
2017-ICML-Deciding How to Decide: Dynamic Routing in Artificial Neural Networks
2017-ICML-ZipML: Training Models with End-to-End Low Precision: The Cans, the Cannots, and a Little Bit of Deep Learning
2017-ICML-Analytical Guarantees on Numerical Precision of Deep Neural Networks
2017-ICML-Adaptive Neural Networks for Efficient Inference
2017-ICML-SplitNet: Learning to Semantically Split Deep Networks for Parameter Reduction and Model Parallelization
2017-CVPR-Learning deep CNN denoiser prior for image restoration
2017-CVPR-Deep roots: Improving cnn efficiency with hierarchical filter groups
2017-CVPR-More is less: A more complicated network with less inference complexity [PyTorch Code]
2017-CVPR-All You Need is Beyond a Good Init: Exploring Better Solution for Training Extremely Deep Convolutional Neural Networks with Orthonormality and Modulation
2017-CVPR-ResNeXt-Aggregated Residual Transformations for Deep Neural Networks
2017-CVPR-Xception: Deep learning with depthwise separable convolutions
2017-CVPR-Designing Energy-Efficient CNN using Energy-aware Pruning
2017-CVPR-Spatially Adaptive Computation Time for Residual Networks
2017-CVPR-Network Sketching: Exploiting Binary Structure in Deep CNNs
2017-CVPR-A Compact DNN: Approaching GoogLeNet-Level Accuracy of Classification and Domain Adaptation
2017-ICCV-Channel pruning for accelerating very deep neural networks [Caffe Code]
2017-ICCV-Learning efficient convolutional networks through network slimming [PyTorch Code]
2017-ICCV-ThiNet: A filter level pruning method for deep neural network compression [Project] [Caffe Code] [2018 TPAMI version]
2017-ICCV-Interleaved group convolutions
2017-ICCV-Coordinating Filters for Faster Deep Neural Networks [Caffe Code]
2017-ICCV-Performance Guaranteed Network Acceleration via High-Order Residual Quantization
2017-NIPS-Net-trim: Convex pruning of deep neural networks with performance guarantee [Code] (Journal version: 2020-SIAM-Fast Convex Pruning of Deep Neural Networks)
2017-NIPS-Runtime neural pruning
2017-NIPS-Learning to Prune Deep Neural Networks via Layer-wise Optimal Brain Surgeon [Code]
2017-NIPS-Federated Multi-Task Learning
2017-NIPS-Towards Accurate Binary Convolutional Neural Network
2017-NIPS-Soft-to-Hard Vector Quantization for End-to-End Learning Compressible Representations
2017-NIPS-TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning
2017-NIPS-Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks
2017-NIPS-Training Quantized Nets: A Deeper Understanding
2017-NIPS-The Reversible Residual Network: Backpropagation Without Storing Activations [Code]
2017-NIPS-Compression-aware Training of Deep Networks
2017-FPGA-ESE: efficient speech recognition engine with compressed LSTM on FPGA [Best paper!]
2017-AISTATS-Communication-Efficient Learning of Deep Networks from Decentralized Data
2017-ICASSP-Accelerating Deep Convolutional Networks using low-precision and sparsity
2017-NNs-Nonredundant sparse feature extraction using autoencoders with receptive fields clustering
2017.02-The Power of Sparsity in Convolutional Neural Networks
2017.07-Stochastic, Distributed and Federated Optimization for Machine Learning
2017.05-Structural Compression of Convolutional Neural Networks Based on Greedy Filter Pruning
2017.07-Extremely Low Bit Neural Network: Squeeze the Last Bit Out with ADMM
2017.11-GPU Kernels for Block-Sparse Weights [Code] (OpenAI)
2017.11-Block-sparse recurrent neural networks

2018

2018-AAAI-Auto-balanced Filter Pruning for Efficient Convolutional Neural Networks
2018-AAAI-Deep Neural Network Compression with Single and Multiple Level Quantization
2018-AAAI-Dynamic Deep Neural Networks_Optimizing Accuracy-Efficiency Trade-offs by Selective Execution
2018-ICLRo-Training and Inference with Integers in Deep Neural Networks
2018-ICLR-Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolution Layers
2018-ICLR-N2N learning: Network to Network Compression via Policy Gradient Reinforcement Learning
2018-ICLR-Model compression via distillation and quantization
2018-ICLR-Towards Image Understanding from Deep Compression Without Decoding
2018-ICLR-Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training
2018-ICLR-Mixed Precision Training of Convolutional Neural Networks using Integer Operations
2018-ICLR-Mixed Precision Training
2018-ICLR-Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy
2018-ICLR-Loss-aware Weight Quantization of Deep Networks
2018-ICLR-Alternating Multi-bit Quantization for Recurrent Neural Networks
2018-ICLR-Adaptive Quantization of Neural Networks
2018-ICLR-Variational Network Quantization
2018-ICLR-Espresso: Efficient Forward Propagation for Binary Deep Neural Networks
2018-ICLR-Learning to share: Simultaneous parameter tying and sparsification in deep learning
2018-ICLR-Learning Sparse Neural Networks through L0 Regularization
2018-ICLR-WRPN: Wide Reduced-Precision Networks
2018-ICLR-Deep rewiring: Training very sparse deep networks
2018-ICLR-Efficient sparse-winograd convolutional neural networks [Code]
2018-ICLR-Learning Intrinsic Sparse Structures within Long Short-term Memory
2018-ICLR-Multi-scale dense networks for resource efficient image classification
2018-ICLR-Compressing Word Embedding via Deep Compositional Code Learning
2018-ICLR-Learning Discrete Weights Using the Local Reparameterization Trick
2018-ICLR-Training wide residual networks for deployment using a single bit for each weight
2018-ICLR-The High-Dimensional Geometry of Binary Neural Networks
2018-ICLRw-To Prune, or Not to Prune: Exploring the Efficacy of Pruning for Model Compression (Similar topic: 2018-NIPSw-nip in the bud, 2018-NIPSw-rethink)
2018-CVPR-Context-Aware Deep Feature Compression for High-Speed Visual Tracking
2018-CVPR-NISP: Pruning Networks using Neuron Importance Score Propagation
2018-CVPR-Condensenet: An efficient densenet using learned group convolutions [Code]
2018-CVPR-Shift: A zero flop, zero parameter alternative to spatial convolutions
2018-CVPR-Explicit Loss-Error-Aware Quantization for Low-Bit Deep Neural Networks
2018-CVPR-Interleaved structured sparse convolutional neural networks
2018-CVPR-Towards Effective Low-bitwidth Convolutional Neural Networks
2018-CVPR-CLIP-Q: Deep Network Compression Learning by In-Parallel Pruning-Quantization
2018-CVPR-Blockdrop: Dynamic inference paths in residual networks
2018-CVPR-Nestednet: Learning nested sparse structures in deep neural networks
2018-CVPR-Stochastic downsampling for cost-adjustable inference and improved regularization in convolutional networks
2018-CVPR-Wide Compression: Tensor Ring Nets
2018-CVPR-Learning Compact Recurrent Neural Networks With Block-Term Tensor Decomposition
2018-CVPR-Learning Time/Memory-Efficient Deep Architectures With Budgeted Super Networks
2018-CVPR-HydraNets: Specialized Dynamic Architectures for Efficient Inference
2018-CVPR-SYQ: Learning Symmetric Quantization for Efficient Deep Neural Networks
2018-CVPR-Towards Effective Low-Bitwidth Convolutional Neural Networks
2018-CVPR-Two-Step Quantization for Low-Bit Neural Networks
2018-CVPR-Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
2018-CVPR-"Learning-Compression" Algorithms for Neural Net Pruning
2018-CVPR-PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning [Code]
2018-CVPR-MorphNet: Fast & Simple Resource-Constrained Structure Learning of Deep Networks [Code]
2018-CVPR-ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices
2018-CVPRw-Squeezenext: Hardware-aware neural network design
2018-IJCAI-Efficient DNN Neuron Pruning by Minimizing Layer-wise Nonlinear Reconstruction Error
2018-IJCAI-Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks [PyTorch Code]
2018-IJCAI-Where to Prune: Using LSTM to Guide End-to-end Pruning
2018-IJCAI-Accelerating Convolutional Networks via Global & Dynamic Filter Pruning
2018-IJCAI-Optimization based Layer-wise Magnitude-based Pruning for DNN Compression
2018-IJCAI-Progressive Blockwise Knowledge Distillation for Neural Network Acceleration
2018-IJCAI-Complementary Binary Quantization for Joint Multiple Indexing
2018-ICML-Compressing Neural Networks using the Variational Information Bottleneck
2018-ICML-DCFNet: Deep Neural Network with Decomposed Convolutional Filters
2018-ICML-Deep k-Means Re-Training and Parameter Sharing with Harder Cluster Assignments for Compressing Deep Convolutions
2018-ICML-Error Compensated Quantized SGD and its Applications to Large-scale Distributed Optimization
2018-ICML-High Performance Zero-Memory Overhead Direct Convolutions
2018-ICML-Kronecker Recurrent Units
2018-ICML-Weightless: Lossy weight encoding for deep neural network compression
2018-ICML-StrassenNets: Deep learning with a multiplication budget
2018-ICML-Learning Compact Neural Networks with Regularization
2018-ICML-WSNet: Compact and Efficient Networks Through Weight Sampling
2018-ICML-Gradually Updated Neural Networks for Large-Scale Image Recognition [Code]
2018-ICML-On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization
2018-ICML-Understanding and simplifying one-shot architecture search
2018-ECCV-A Systematic DNN Weight Pruning Framework using Alternating Direction Method of Multipliers [Code]
2018-ECCV-Coreset-Based Neural Network Compression
2018-ECCV-Data-Driven Sparse Structure Selection for Deep Neural Networks [MXNet Code]
2018-ECCV-Training Binary Weight Networks via Semi-Binary Decomposition
2018-ECCV-Learning Compression from Limited Unlabeled Data
2018-ECCV-Constraint-Aware Deep Neural Network Compression
2018-ECCV-Sparsely Aggregated Convolutional Networks
2018-ECCV-Deep Expander Networks: Efficient Deep Networks from Graph Theory [Code]
2018-ECCV-SparseNet-Sparsely Aggregated Convolutional Networks [Code]
2018-ECCV-Ask, acquire, and attack: Data-free uap generation using class impressions
2018-ECCV-Netadapt: Platform-aware neural network adaptation for mobile applications
2018-ECCV-Clustering Convolutional Kernels to Compress Deep Neural Networks
2018-ECCV-Bi-Real Net: Enhancing the Performance of 1-bit CNNs With Improved Representational Capability and Advanced Training Algorithm
2018-ECCV-Extreme Network Compression via Filter Group Approximation
2018-ECCV-Convolutional Networks with Adaptive Inference Graphs
2018-ECCV-SkipNet: Learning Dynamic Routing in Convolutional Networks [Code]
2018-ECCV-Value-aware Quantization for Training and Inference of Neural Networks
2018-ECCV-LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks
2018-ECCV-AMC: AutoML for Model Compression and Acceleration on Mobile Devices
2018-ECCV-Piggyback: Adapting a single network to multiple tasks by learning to mask weights
2018-BMVCo-Structured Probabilistic Pruning for Convolutional Neural Network Acceleration
2018-BMVC-Efficient Progressive Neural Architecture Search
2018-BMVC-Igcv3: Interleaved lowrank group convolutions for efficient deep neural networks
2018-NIPS-Discrimination-aware Channel Pruning for Deep Neural Networks
2018-NIPS-Frequency-Domain Dynamic Pruning for Convolutional Neural Networks
2018-NIPS-ChannelNets: Compact and Efficient Convolutional Neural Networks via Channel-Wise Convolutions
2018-NIPS-DropBlock: A regularization method for convolutional networks
2018-NIPS-Constructing fast network through deconstruction of convolution
2018-NIPS-Learning Versatile Filters for Efficient Convolutional Neural Networks [Code]
2018-NIPS-Moonshine: Distilling with cheap convolutions
2018-NIPS-HitNet: hybrid ternary recurrent neural network
2018-NIPS-FastGRNN: A Fast, Accurate, Stable and Tiny Kilobyte Sized Gated Recurrent Neural Network
2018-NIPS-Training DNNs with Hybrid Block Floating Point
2018-NIPS-Reversible Recurrent Neural Networks
2018-NIPS-Synaptic Strength For Convolutional Neural Network
2018-NIPS-Learning sparse neural networks via sensitivity-driven regularization
2018-NIPS-Multi-Task Zipping via Layer-wise Neuron Sharing
2018-NIPS-A Linear Speedup Analysis of Distributed Deep Learning with Sparse and Quantized Communication
2018-NIPS-Gradient Sparsification for Communication-Efficient Distributed Optimization
2018-NIPS-GradiVeQ: Vector Quantization for Bandwidth-Efficient Gradient Aggregation in Distributed CNN Training
2018-NIPS-ATOMO: Communication-efficient Learning via Atomic Sparsification
2018-NIPS-Norm matters: efficient and accurate normalization schemes in deep networks
2018-NIPS-Sparsified SGD with memory
2018-NIPS-Pelee: A Real-Time Object Detection System on Mobile Devices
2018-NIPS-Scalable methods for 8-bit training of neural networks
2018-NIPS-TETRIS: TilE-matching the TRemendous Irregular Sparsity
2018-NIPS-Training deep neural networks with 8-bit floating point numbers
2018-NIPS-Multiple instance learning for efficient sequential data classification on resource-constrained devices
2018-NIPS-Sparse dnns with improved adversarial robustness
2018-NIPSw-Pruning neural networks: is it time to nip it in the bud?
2018-NIPSw-Rethinking the Value of Network Pruning [2019 ICLR version] [PyTorch Code]
2018-NIPSw-Structured Pruning for Efficient ConvNets via Incremental Regularization [2019 IJCNN version] [Caffe Code]
2018-NIPSw-Adaptive Mixture of Low-Rank Factorizations for Compact Neural Modeling
2018-NIPSw-Learning Sparse Networks Using Targeted Dropout [OpenReview] [Code]
2018-WACV-Recovering from Random Pruning: On the Plasticity of Deep Convolutional Neural Networks
2018.05-Compression of Deep Convolutional Neural Networks under Joint Sparsity Constraints
2018.05-AutoPruner: An End-to-End Trainable Filter Pruning Method for Efficient Deep Model Inference
2018.10-A Closer Look at Structured Pruning for Neural Network Compression [Code]
2018.11-Second-order Optimization Method for Large Mini-batch: Training ResNet-50 on ImageNet in 35 Epochs
2018.11-PydMobileNet: Improved Version of MobileNets with Pyramid Depthwise Separable Convolution

2019

2019-MLSys-Towards Federated Learning at Scale: System Design
2019-MLsys-To compress or not to compress: Understanding the Interactions between Adversarial Attacks and Neural Network Compression
2019-ICLR-Slimmable Neural Networks [Code]
2019-ICLR-Defensive Quantization: When Efficiency Meets Robustness
2019-ICLR-Minimal Random Code Learning: Getting Bits Back from Compressed Model Parameters [Code]
2019-ICLR-ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware [Code]
2019-ICLR-SNIP: Single-shot Network Pruning based on Connection Sensitivity
2019-ICLR-Non-vacuous Generalization Bounds at the ImageNet Scale: a PAC-Bayesian Compression Approach
2019-ICLR-Dynamic Channel Pruning: Feature Boosting and Suppression
2019-ICLR-Energy-Constrained Compression for Deep Neural Networks via Weighted Sparse Projection and Layer Input Masking
2019-ICLR-RotDCF: Decomposition of Convolutional Filters for Rotation-Equivariant Deep Networks
2019-ICLR-Dynamic Sparse Graph for Efficient Deep Learning
2019-ICLR-Big-Little Net: An Efficient Multi-Scale Feature Representation for Visual and Speech Recognition
2019-ICLR-Data-Dependent Coresets for Compressing Neural Networks with Applications to Generalization Bounds
2019-ICLR-Learning Recurrent Binary/Ternary Weights
2019-ICLR-Double Viterbi: Weight Encoding for High Compression Ratio and Fast On-Chip Reconstruction for Deep Neural Network
2019-ICLR-Relaxed Quantization for Discretized Neural Networks
2019-ICLR-Integer Networks for Data Compression with Latent-Variable Models
2019-ICLR-Minimal Random Code Learning: Getting Bits Back from Compressed Model Parameters
2019-ICLR-Analysis of Quantized Models
2019-ICLR-DARTS: Differentiable Architecture Search [Code]
2019-ICLR-Graph HyperNetworks for Neural Architecture Search
2019-ICLR-Learnable Embedding Space for Efficient Neural Architecture Compression [Code]
2019-ICLR-Efficient Multi-Objective Neural Architecture Search via Lamarckian Evolution
2019-ICLR-SNAS: stochastic neural architecture search
2019-AAAIo-A layer decomposition-recomposition framework for neuron pruning towards accurate lightweight networks
2019-AAAI-Balanced Sparsity for Efficient DNN Inference on GPU [Code]
2019-AAAI-CircConv: A Structured Convolution with Low Complexity
2019-AAAI-Regularized Evolution for Image Classifier Architecture Search
2019-AAAI-Universal Approximation Property and Equivalence of Stochastic Computing-Based Neural Networks and Binary Neural Networks
2019-WACV-DAC: Data-free Automatic Acceleration of Convolutional Networks
2019-ASPLOS-Packing Sparse Convolutional Neural Networks for Efficient Systolic Array Implementations: Column Combining Under Joint Optimization
2019-CVPRo-HAQ: hardware-aware automated quantization
2019-CVPRo-Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration [Code]
2019-CVPR-All You Need is a Few Shifts: Designing Efficient Convolutional Neural Networks for Image Classification
2019-CVPR-Importance Estimation for Neural Network Pruning [Code]
2019-CVPR-HetConv Heterogeneous Kernel-Based Convolutions for Deep CNNs
2019-CVPR-Fully Learnable Group Convolution for Acceleration of Deep Neural Networks
2019-CVPR-Towards Optimal Structured CNN Pruning via Generative Adversarial Learning
2019-CVPR-ChamNet: Towards Efficient Network Design through Platform-Aware Model Adaptation
2019-CVPR-Partial Order Pruning: for Best Speed/Accuracy Trade-off in Neural Architecture Search [Code]
2019-CVPR-Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation [Code]
2019-CVPR-MnasNet: Platform-Aware Neural Architecture Search for Mobile [Code]
2019-CVPR-MFAS: Multimodal Fusion Architecture Search
2019-CVPR-A Neurobiological Evaluation Metric for Neural Network Model Search
2019-CVPR-Fast Neural Architecture Search of Compact Semantic Segmentation Models via Auxiliary Cells
2019-CVPR-Efficient Neural Network Compression [Code]
2019-CVPR-T-Net: Parametrizing Fully Convolutional Nets with a Single High-Order Tensor
2019-CVPR-Centripetal SGD for Pruning Very Deep Convolutional Networks with Complicated Structure [Code]
2019-CVPR-DSC: Dense-Sparse Convolution for Vectorized Inference of Convolutional Neural Networks
2019-CVPR-DupNet: Towards Very Tiny Quantized CNN With Improved Accuracy for Face Detection
2019-CVPR-ECC: Platform-Independent Energy-Constrained Deep Neural Network Compression via a Bilinear Regression Model
2019-CVPR-Variational Convolutional Neural Network Pruning
2019-CVPR-Accelerating Convolutional Neural Networks via Activation Map Compression
2019-CVPR-Compressing Convolutional Neural Networks via Factorized Convolutional Filters
2019-CVPR-Deep Virtual Networks for Memory Efficient Inference of Multiple Tasks
2019-CVPR-Exploiting Kernel Sparsity and Entropy for Interpretable CNN Compression
2019-CVPR-MBS: Macroblock Scaling for CNN Model Reduction
2019-CVPR-On Implicit Filter Level Sparsity in Convolutional Neural Networks
2019-CVPR-Structured Pruning of Neural Networks With Budget-Aware Regularization
2019-CVPRo-Neural Rejuvenation: Improving Deep Network Training by Enhancing Computational Resource Utilization [Code]
2019-ICML-Approximated Oracle Filter Pruning for Destructive CNN Width Optimization [Code]
2019-ICML-EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis [PyTorch Code]
2019-ICML-Zero-Shot Knowledge Distillation in Deep Networks [Code]
2019-ICML-LegoNet: Efficient Convolutional Neural Networks with Lego Filters [Code]
2019-ICML-EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks [Code]
2019-ICML-Collaborative Channel Pruning for Deep Networks
2019-ICML-Training CNNs with Selective Allocation of Channels
2019-ICML-NAS-Bench-101: Towards Reproducible Neural Architecture Search [Code]
2019-ICML-Learning fast algorithms for linear transforms using butterfly factorizations
2019-ICMLw-Towards Learning of Filter-Level Heterogeneous Compression of Convolutional Neural Networks [Code] (AutoML workshop)
2019-IJCAI-Play and Prune: Adaptive Filter Pruning for Deep Model Compression
2019-BigComp-Towards Robust Compressed Convolutional Neural Networks
2019-ICCV-Rethinking ImageNet Pre-training
2019-ICCV-Universally Slimmable Networks and Improved Training Techniques
2019-ICCV-MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning [Code]
2019-ICCV-Progressive Differentiable Architecture Search: Bridging the Depth Gap between Search and Evaluation [Code]
2019-ICCV-Data-Free Quantization through Weight Equalization and Bias Correction
2019-ICCV-ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks
2019-ICCV-Adversarial Robustness vs. Model Compression, or Both? [PyTorch Code]
2019-NIPS-Global Sparse Momentum SGD for Pruning Very Deep Neural Networks
2019-NIPS-Model Compression with Adversarial Robustness: A Unified Optimization Framework
2019-NIPS-AutoPrune: Automatic Network Pruning by Regularizing Auxiliary Parameters
2019-NIPS-Double Quantization for Communication-Efficient Distributed Optimization
2019-NIPS-Focused Quantization for Sparse CNNs
2019-NIPS-E2-Train: Training State-of-the-art CNNs with Over 80% Energy Savings
2019-NIPS-MetaQuant: Learning to Quantize by Learning to Penetrate Non-differentiable Quantization
2019-NIPS-Random Projections with Asymmetric Quantization
2019-NIPS-Network Pruning via Transformable Architecture Search [Code]
2019-NIPS-Point-Voxel CNN for Efficient 3D Deep Learning [Code]
2019-NIPS-Gate Decorator: Global Filter Pruning Method for Accelerating Deep Convolutional Neural Networks [PyTorch Code]
2019-NIPS-A Mean Field Theory of Quantized Deep Networks: The Quantization-Depth Trade-Off
2019-NIPS-Qsparse-local-SGD: Distributed SGD with Quantization, Sparsification and Local Computations
2019-NIPS-Post training 4-bit quantization of convolutional networks for rapid-deployment
2019-PR-Filter-in-Filter: Improve CNNs in a Low-cost Way by Sharing Parameters among the Sub-filters of a Filter
2019-PRL-BDNN: Binary Convolution Neural Networks for Fast Object Detection
2019-TNNLS-Towards Compact ConvNets via Structure-Sparsity Regularized Filter Pruning [Code]
2019.03-Network Slimming by Slimmable Networks: Towards One-Shot Architecture Search for Channel Numbers [Code]
2019.03-Single Path One-Shot Neural Architecture Search with Uniform Sampling
2019.04-Resource Efficient 3D Convolutional Neural Networks
2019.04-Meta Filter Pruning to Accelerate Deep Convolutional Neural Networks
2019.04-Knowledge Squeezed Adversarial Network Compression
2019.05-Dynamic Neural Network Channel Execution for Efficient Training
2019.06-AutoGrow: Automatic Layer Growing in Deep Convolutional Networks
2019.06-BasisConv: A method for compressed representation and learning in CNNs
2019.06-BlockSwap: Fisher-guided Block Substitution for Network Compression
2019.06-Separable Layers Enable Structured Efficient Linear Substitutions [Code]
2019.06-Butterfly Transform: An Efficient FFT Based Neural Architecture Design
2019.06-A Taxonomy of Channel Pruning Signals in CNNs
2019.08-Adversarial Neural Pruning with Latent Vulnerability Suppression
2019.09-Training convolutional neural networks with cheap convolutions and online distillation
2019.09-Pruning from Scratch
2019.11-Adversarial Interpolation Training: A Simple Approach for Improving Model Robustness
2019.11-A Programmable Approach to Model Compression [Code]

2020

2020-AAAI-Pconv: The missing but desirable sparsity in dnn weight pruning for real-time execution on mobile devices
2020-AAAI-Channel Pruning Guided by Classification Loss and Feature Importance
2020-AAAI-Pruning from Scratch
2020-AAAI-Harmonious Coexistence of Structured Weight Pruning and Ternarization for Deep Neural Networks
2020-AAAI-AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates
2020-AAAI-DARB: A Density-Adaptive Regular-Block Pruning for Deep Neural Networks
2020-AAAI-Real-Time Object Tracking via Meta-Learning: Efficient Model Adaptation and One-Shot Channel Pruning
2020-AAAI-Dynamic Network Pruning with Interpretable Layerwise Channel Selection
2020-AAAI-Reborn Filters: Pruning Convolutional Neural Networks with Limited Data
2020-AAAI-Layerwise Sparse Coding for Pruned Deep Neural Networks with Extreme Compression Ratio
2020-AAAI-Sparsity-inducing Binarized Neural Networks
2020-AAAI-Structured Sparsification of Gated Recurrent Neural Networks
2020-AAAI-Hierarchical Knowledge Squeezed Adversarial Network Compression
2020-AAAI-Embedding Compression with Isotropic Iterative Quantization
2020-ICLR-Comparing Rewinding and Fine-tuning in Neural Network Pruning [Code]
2020-ICLR-Lookahead: A Far-sighted Alternative of Magnitude-based Pruning [Code]
2020-ICLR-Dynamic Model Pruning with Feedback
2020-ICLR-Provable Filter Pruning for Efficient Neural Networks
2020-ICLR-Data-Independent Neural Pruning via Coresets
2020-ICLR-FSNet: Compression of Deep Convolutional Neural Networks by Filter Summary
2020-ICLR-Probabilistic Connection Importance Inference and Lossless Compression of Deep Neural Networks
2020-ICLR-Neural Epitome Search for Architecture-Agnostic Network Compression
2020-ICLR-One-Shot Pruning of Recurrent Neural Networks by Jacobian Spectrum Evaluation
2020-ICLR-DeepHoyer: Learning Sparser Neural Network with Differentiable Scale-Invariant Sparsity Measures [Code]
2020-ICLR-Dynamic Sparse Training: Find Efficient Sparse Network From Scratch With Trainable Masked Layers
2020-ICLR-Scalable Model Compression by Entropy Penalized Reparameterization
2020-ICLR-A Signal Propagation Perspective for Pruning Neural Networks at Initialization
2020-CVPR-GhostNet: More Features from Cheap Operations [Code]
2020-CVPR-Filter Grafting for Deep Neural Networks
2020-CVPR-Low-rank Compression of Neural Nets: Learning the Rank of Each Layer
2020-CVPR-Structured Compression by Weight Encryption for Unstructured Pruning and Quantization
2020-CVPR-Learning Filter Pruning Criteria for Deep Convolutional Neural Networks Acceleration
2020-CVPR-APQ: Joint Search for Network Architecture, Pruning and Quantization Policy
2020-CVPR-Group Sparsity: The Hinge Between Filter Pruning and Decomposition for Network Compression [Code]
2020-CVPR-Neural Network Pruning With Residual-Connections and Limited-Data
2020-CVPR-Multi-Dimensional Pruning: A Unified Framework for Model Compression
2020-CVPR-Discrete Model Compression With Resource Constraint for Deep Neural Networks
2020-CVPR-Automatic Neural Network Compression by Sparsity-Quantization Joint Learning: A Constrained Optimization-Based Approach
2020-CVPR-Low-Rank Compression of Neural Nets: Learning the Rank of Each Layer
2020-CVPR-The Knowledge Within: Methods for Data-Free Model Compression
2020-CVPR-GAN Compression: Efficient Architectures for Interactive Conditional GANs [Code]
2020-CVPR-Few Sample Knowledge Distillation for Efficient Network Compression
2020-CVPR-Fast sparse convnets
2020-CVPR-Structured Multi-Hashing for Model Compression
2020-CVPRo-AdderNet: Do We Really Need Multiplications in Deep Learning? [Code]
2020-CVPRo-Towards Efficient Model Compression via Learned Global Ranking [Code]
2020-CVPRo-HRank: Filter Pruning Using High-Rank Feature Map [Code]
2020-CVPRo-DaST: Data-free Substitute Training for Adversarial Attacks [Code]
2020-ICML-PENNI: Pruned Kernel Sharing for Efficient CNN Inference [Code]
2020-ICML-Operation-Aware Soft Channel Pruning using Differentiable Masks
2020-ICML-DropNet: Reducing Neural Network Complexity via Iterative Pruning
2020-ICML-Network Pruning by Greedy Subnetwork Selection
2020-ICML-AutoGAN-Distiller: Searching to Compress Generative Adversarial Networks
2020-ICML-Soft Threshold Weight Reparameterization for Learnable Sparsity [PyTorch Code]
2020-ICML-Activation sparsity: Inducing and exploiting activation sparsity for fast inference on deep neural networks
2020-EMNLP-Structured Pruning of Large Language Models [Code]
2020-NIPS-Pruning neural networks without any data by iteratively conserving synaptic flow
2020-NIPS-Neuron-level Structured Pruning using Polarization Regularizer
2020-NIPS-SCOP: Scientific Control for Reliable Neural Network Pruning
2020-NIPS-Directional Pruning of Deep Neural Networks
2020-NIPS-Storage Efficient and Dynamic Flexible Runtime Channel Pruning via Deep Reinforcement Learning
2020-NIPS-Pruning Filter in Filter
2020-NIPS-HYDRA: Pruning Adversarially Robust Neural Networks
2020-NIPS-Movement Pruning: Adaptive Sparsity by Fine-Tuning
2020-NIPS-Sanity-Checking Pruning Methods: Random Tickets can Win the Jackpot
2020-NIPS-Position-based Scaled Gradient for Model Quantization and Pruning
2020-NIPS-The Generalization-Stability Tradeoff In Neural Network Pruning
2020-NIPS-FleXOR: Trainable Fractional Quantization
2020-NIPS-Adaptive Gradient Quantization for Data-Parallel SGD
2020-NIPS-Robust Quantization: One Model to Rule Them All
2020-NIPS-HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks
2020-NIPS-Efficient Exact Verification of Binarized Neural Networks
2020-NIPS-Ultra-Low Precision 4-bit Training of Deep Neural Networks
2020-NIPS-Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks
2020-NIPS-Fast fourier convolution
2020-NIPS-Structured Convolutions for Efficient Neural Network Design

2021

2021-WACV-CAP: Context-Aware Pruning for Semantic Segmentation [Code]
2021-AAAI-Few Shot Network Compression via Cross Distillation
2021-AAAI-Conditional Channel Pruning for Automated Model Compression [Code]
2021-ICLR-Neural Pruning via Growing Regularization [PyTorch Code]
2021-ICLR-Network Pruning That Matters: A Case Study on Retraining Variants
2021-ICLR-ChipNet: Budget-Aware Pruning with Heaviside Continuous Approximations
2021-ICLR-A Gradient Flow Framework For Analyzing Network Pruning (Spotlight)
2021-CVPR-Towards Compact CNNs via Collaborative Compression
2021-CVPR-Manifold Regularized Dynamic Network Pruning
2021-CVPR-Learnable Companding Quantization for Accurate Low-bit Neural Networks
2021-CVPR-Diversifying Sample Generation for Accurate Data-Free Quantization
2021-CVPR-Zero-shot Adversarial Quantization [Oral] [Code]
2021-CVPR-Network Quantization with Element-wise Gradient Scaling [Project]
2021-ICML-Group Fisher Pruning for Practical Network Compression [Code]
2021-ICML-Accelerate CNNs from Three Dimensions: A Comprehensive Pruning Framework
2021-ICML-A Probabilistic Approach to Neural Network Pruning
2021-ICML-On the Predictability of Pruning Across Scales
2021-ICML-Sparsifying Networks via Subdifferential Inclusion
2021-ICML-Selfish Sparse RNN Training [Code]
2021-ICML-Do We Actually Need Dense Over-Parameterization? In-Time Over-Parameterization in Sparse Training [Code]
2021-ICML-Training Adversarially Robust Sparse Networks via Bayesian Connectivity Sampling
2021-ICML-ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training
2021-ICML-Leveraging Sparse Linear Layers for Debuggable Deep Networks
2021-ICML-PHEW: Constructing Sparse Networks that Learn Fast and Generalize Well without Training Data
2021-ICML-BASE Layers: Simplifying Training of Large, Sparse Models [Code]
2021-ICML-Dense for the Price of Sparse: Improved Performance of Sparsely Initialized Networks via a Subspace Offset
2021-ICML-I-BERT: Integer-only BERT Quantization
2021-ICML-Training Quantized Neural Networks to Global Optimality via Semidefinite Programming
2021-ICML-Differentiable Dynamic Quantization with Mixed Precision and Adaptive Resolution
2021-ICML-Communication-Efficient Distributed Optimization with Quantized Preconditioners
2021-NIPS-Aligned Structured Sparsity Learning for Efficient Image Super-Resolution [Code] (Spotlight!)
2021-NIPS-Scatterbrain: Unifying Sparse and Low-rank Attention [Code]
2021-NIPS-Only Train Once: A One-Shot Neural Network Training And Pruning Framework [Code]
2021-NIPS-CHIP: CHannel Independence-based Pruning for Compact Neural Networks [Code]
2021.5-Dynamical Isometry: The Missing Ingredient for Neural Network Pruning

2022

2023

2023-ICLR-Trainability Preserving Neural Pruning [Code]
2023-ICLR-NTK-SAP: Improving neural network pruning by aligning training dynamics [Code]
2023-CVPR-DepGraph: Towards Any Structural Pruning[code]
2023-CVPR-CP3: Channel Pruning Plug-in for Point-based Networks
2023-ICML-UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers [code]
2023-ICML-Gradient-Free Structured Pruning with Unlabeled Data
2023-ICML-Reconstructive Neuron Pruning for Backdoor Defense [code]
2023-ICML-UPSCALE: Unconstrained Channel Pruning [code]
2023-ICML-SparseGPT: Massive Language Models Can be Accurately Pruned in One-Shot [code]
2023-ICML-Why Random Pruning Is All We Need to Start Sparse [code]
2023-ICML-Fast as CHITA: Neural Network Pruning with Combinatorial Optimization [code]
2023-ICML-A Three-regime Model of Network Pruning [code]
2023-ICML-Instant Soup: Cheap Pruning Ensembles in A Single Pass Can Draw Lottery Tickets from Large Models [code]
2023-ICML-Pruning via Sparsity-indexed ODE: a Continuous Sparsity Viewpoint [code]
2023-ICML-Efficient Latency-Aware CNN Depth Compression via Two-Stage Dynamic Programming [Code]

Papers [Actual Acceleration via Sparsity]

2018-ICML-Efficient Neural Audio Synthesis
2018-NIPS-Tetris: Tile-matching the tremendous irregular sparsity
2021.4-Accelerating Sparse Deep Neural Networks (White paper from NVIDIA)
2021-ICLR-Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch [Code]
2021-NIPS-Channel Permutations for N: M Sparsity [Code: NVIDIA ASP]
2021-NIPS-Accelerated Sparse Neural Training: A Provable and Efficient Method to Find N:M Transposable Masks
2021-ICLR-Learning N:M fine-grained structured sparse neural networks from scratch [Code] [Slides]
2022-NIPS-UDC: Unified DNAS for Compressible TinyML Models for Neural Processing Units

Papers [Lottery Ticket Hypothesis (LTH)]

For LTH and other Pruning at Initialization papers, please refer to Awesome-Pruning-at-Initialization.

Papers [Bayesian Compression]

1995-Neural Computation-Bayesian Regularisation and Pruning using a Laplace Prior
1997-Neural Networks-Regularization with a Pruning Prior
2015-NIPS-Bayesian dark knowledge
2017-NIPS-Bayesian Compression for Deep Learning [Code]
2017-ICML-Variational dropout sparsifies deep neural networks
2017-NIPSo-Structured Bayesian Pruning via Log-Normal Multiplicative Noise
2017-ICMLw-Bayesian Sparsification of Recurrent Neural Networks
2020-NIPS-Bayesian Bits: Unifying Quantization and Pruning

Papers [Knowledge Distillation (KD)]

Before 2014

1996-Born again trees (proposed compressing neural networks and multipletree predictors by approximating them with a single tree)
2006-SIGKDD-Model compression
2010-ML-A theory of learning from different domains

2014

2014-NIPS-Do deep nets really need to be deep?
2014-NIPSw-Distilling the Knowledge in a Neural Network [Code]

2016

2017

2018

2018-AAAI-DarkRank: Accelerating Deep Metric Learning via Cross Sample Similarities Transfer
2018-AAAI-Dynamic deep neural networks: Optimizing accuracy-efficiency trade-offs by selective execution
2018-AAAI-Rocket Launching: A Universal and Efficient Framework for Training Well-performing Light Net
2018-AAAI-Adversarial Learning of Portable Student Networks
2018-AAAI-Knowledge Distillation in Generations: More Tolerant Teachers Educate Better Students
2018-ICLR-Large scale distributed neural network training through online distillation
2018-CVPR-Deep mutual learning
2018-ICML-Born-Again Neural Networks
2018-IJCAI-Better and Faster: Knowledge Transfer from Multiple Self-supervised Learning Tasks via Graph Distillation for Video Classification
2018-ECCV-2018-ECCV-Learning deep representations with probabilistic knowledge transfer [Code]
2018-ECCV-Graph adaptive knowledge transfer for unsupervised domain adaptation
2018-SIGKDD-Towards Evolutionary Compression
2018-NIPS-KDGAN: knowledge distillation with generative adversarial networks [2019 TPAMI version]
2018-NIPS-Knowledge Distillation by On-the-Fly Native Ensemble
2018-NIPS-Paraphrasing Complex Network: Network Compression via Factor Transfer
2018-NIPSw-Variational Mutual Information Distillation for Transfer Learning workshop: continual learning
2018-NIPSw-Transparent Model Distillation
2018.03-Interpreting Deep Classifier by Visual Distillation of Dark Knowledge
2018.11-Dataset Distillation [Code]
2018.12-Learning Student Networks via Feature Embedding
2018.12-Few Sample Knowledge Distillation for Efficient Network Compression

2019

2019-AAAI-Knowledge Distillation with Adversarial Samples Supporting Decision Boundary
2019-AAAI-Knowledge Transfer via Distillation of Activation Boundaries Formed by Hidden Neurons [Code]
2019-AAAI-Learning to Steer by Mimicking Features from Heterogeneous Auxiliary Networks [Code]
2019-CVPR-Knowledge Representing: Efficient, Sparse Representation of Prior Knowledge for Knowledge Distillation
2019-CVPR-Knowledge Distillation via Instance Relationship Graph
2019-CVPR-Variational Information Distillation for Knowledge Transfer
2019-CVPR-Learning Metrics from Teachers Compact Networks for Image Embedding [Code]
2019-ICCV-A Comprehensive Overhaul of Feature Distillation
2019-ICCV-Similarity-Preserving Knowledge Distillation
2019-ICCV-Correlation Congruence for Knowledge Distillation
2019-ICCV-Data-Free Learning of Student Networks
2019-ICCV-Learning Lightweight Lane Detection CNNs by Self Attention Distillation [Code]
2019-ICCV-Attention bridging network for knowledge transfer
2019-NIPS-Zero-shot Knowledge Transfer via Adversarial Belief Matching [Code] (spotlight)
2019.05-DistillHash: Unsupervised Deep Hashing by Distilling Data Pairs

2020

2020-ICLR-Contrastive Representation Distillation [Code]
2020-AAAI-A Knowledge Transfer Framework for Differentially Private Sparse Learning
2020-AAAI-Uncertainty-aware Multi-shot Knowledge Distillation for Image-based Object Re-identification
2020-AAAI-Improved Knowledge Distillation via Teacher Assistant
2020-AAAI-Knowledge Distillation from Internal Representations
2020-AAAI-Distilling Knowledge from Well-informed Soft Labels for Neural Relation Extraction
2020-AAAI-Online Knowledge Distillation with Diverse Peers
2020-AAAI-Ultrafast Video Attention Prediction with Coupled Knowledge Distillation
2020-AAAI-Graph Few-shot Learning via Knowledge Transfer
2020-AAAI-Diversity Transfer Network for Few-Shot Learning
2020-AAAI-Few Shot Network Compression via Cross Distillation
2020-ICLR-Knowledge Consistency between Neural Networks and Beyond
2020-ICLR-Contrastive Representation Distillation [Code]
2020-ICLR-BlockSwap: Fisher-guided Block Substitution for Network Compression on a Budget
2020-ICLR-Ensemble Distribution Distillation
2020-CVPR-Collaborative Distillation for Ultra-Resolution Universal Style Transfer [Code]
2020-CVPR-Explaining Knowledge Distillation by Quantifying the Knowledge
2020-CVPR-Self-training with Noisy Student improves ImageNet classification [Code]
2020-CVPR-Neural Networks Are More Productive Teachers Than Human Raters: Active Mixup for Data-Efficient Knowledge Distillation From a Blackbox Model
2020-CVPR-Heterogeneous Knowledge Distillation Using Information Flow Modeling
2020-CVPR-Creating Something From Nothing: Unsupervised Knowledge Distillation for Cross-Modal Hashing
2020-CVPR-Revisiting Knowledge Distillation via Label Smoothing Regularization
2020-CVPR-Distilling Knowledge From Graph Convolutional Networks
2020-CVPR-MineGAN: Effective Knowledge Transfer From GANs to Target Domains With Few Images [Code]
2020-CVPRo-Dreaming to Distill: Data-Free Knowledge Transfer via DeepInversion [Code]
2020-CVPR-Online Knowledge Distillation via Collaborative Learning
2020-CVPR-Distilling Cross-Task Knowledge via Relationship Matching
2020-CVPR-Data-Free Knowledge Amalgamation via Group-Stack Dual-GAN
2020-CVPR-Regularizing Class-Wise Predictions via Self-Knowledge Distillation
2020-ICML-Feature-map-level Online Adversarial Knowledge Distillation
2020-NIPS-Self-Distillation as Instance-Specific Label Smoothing
2020-NIPS-Ensemble Distillation for Robust Model Fusion in Federated Learning
2020-NIPS-Self-Distillation Amplifies Regularization in Hilbert Space
2020-NIPS-MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
2020-NIPS-Residual Distillation: Towards Portable Deep Neural Networks without Shortcuts
2020-NIPS-Kernel Based Progressive Distillation for Adder Neural Networks
2020-NIPS-Agree to Disagree: Adaptive Ensemble Knowledge Distillation in Gradient Space
2020-NIPS-Task-Oriented Feature Distillation
2020-NIPS-Comprehensive Attention Self-Distillation for Weakly-Supervised Object Detection
2020-NIPS-Distributed Distillation for On-Device Learning
2020-NIPS-Knowledge Distillation in Wide Neural Networks: Risk Bound, Data Efficiency and Imperfect Teacher
2020.12-Knowledge Distillation Thrives on Data Augmentation
2020.12-Multi-head Knowledge Distillation for Model Compression

2021

2021-AAAI-Cross-Layer Distillation with Semantic Calibration [Code]
2021-ICLR-Distilling Knowledge from Reader to Retriever for Question Answering
2021-ICLR-Improve Object Detection with Feature-based Knowledge Distillation: Towards Accurate and Efficient Detectors
2021-ICLR-Knowledge distillation via softmax regression representation learning [Code]
2021-ICLR-Knowledge Distillation as Semiparametric Inference
2021-ICLR-Is Label Smoothing Truly Incompatible with Knowledge Distillation: An Empirical Study
2021-ICLR-Rethinking Soft Labels for Knowledge Distillation: A Bias–Variance Tradeoff Perspective
2021-CVPR-Refine Myself by Teaching Myself: Feature Refinement via Self-Knowledge Distillation [PyTorch Code]
2021-CVPR-Complementary Relation Contrastive Distillation
2021-CVPR-Distilling Knowledge via Knowledge Review [Code]
2021-CVPR-Data-Free Knowledge Distillation For Image Super-Resolution
2021-ICML-KD3A: Unsupervised Multi-Source Decentralized Domain Adaptation via Knowledge Distillation
2021-ICML-A statistical perspective on distillation
2021-ICML-Training data-efficient image transformers & distillation through attention
2021-ICML-Zero-Shot Knowledge Distillation from a Decision-Based Black-Box Model
2021-ICML-Data-Free Knowledge Distillation for Heterogeneous Federated Learning
2021-ICML-Simultaneous Similarity-based Self-Distillation for Deep Metric Learning
2021-NIPS-Slow Learning and Fast Inference: Efficient Graph Similarity Computation via Knowledge Distillation [Code]

2022

Papers [AutoML (NAS etc.)]

Papers [Interpretability]

2010-JMLR-How to explain individual classification decisions
2015-PLOS ONE-On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation
2015-CVPR-Learning to generate chairs with convolutional neural networks
2015-CVPR-Understanding deep image representations by inverting them [2016 IJCV version: Visualizing deep convolutional neural networks using natural pre-images]
2016-CVPR-Inverting Visual Representations with Convolutional Networks
2016-KDD-"Why Should I Trust You?": Explaining the Predictions of Any Classifier
2016-ICMLw-The Mythos of Model Interpretability
2017-NIPSw-The (Un)reliability of saliency methods
2017-DSP-Methods for interpreting and understanding deep neural networks
2018-ICML-Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors
2018-CVPR-Deep Image Prior [Code]
2018-NIPSs-Sanity Checks for Saliency Maps
2018-NIPSs-Human-in-the-Loop Interpretability Prior
2018-NIPS-To Trust Or Not To Trust A Classifier [Code]
2019-AISTATS-Interpreting Black Box Predictions using Fisher Kernels
2019.05-Luck Matters: Understanding Training Dynamics of Deep ReLU Networks
2019.05-Adversarial Examples Are Not Bugs, They Are Features
2019.06-The Generalization-Stability Tradeoff in Neural Network Pruning
2019.06-One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizers
2019-Book-Interpretable Machine Learning

Workshops

2017-ICML Tutorial: interpretable machine learning
2018-ICML Workshop: Efficient Credit Assignment in Deep Learning and Reinforcement Learning
CDNNRIA Workshop (Compact Deep Neural Network Representation with Industrial Applications): 1st-2018-NIPSw, 2nd-2019-ICMLw
LLD Workshop (Learning with Limited Data): 1st-2017-NIPSw, 2nd-2019-ICLRw
WHI (Worshop on Human Interpretability in Machine Learning): 1st-2016-ICMLw, 2nd-2017-ICMLw, 3rd-2018-ICMLw
NIPS-18 Workshop on Systems for ML and Open Source Software
MLPCD Workshop (Machine Learning on the Phone and other Consumer Devices): 2nd-2018-NIPSw
Workshop on Bayesian Deep Learning
2020 CVPR Workshop on NAS

Books & Courses

TinyML and Efficient Deep Learning @MIT by Prof. Song Han

Lightweight DNN Engines/APIs

NNPACK
DMLC: Tensor Virtual Machine (TVM): Open Deep Learning Compiler Stack
Tencent: NCNN
Xiaomi: MACE, Mobile AI Benchmark
Alibaba: MNN blog (in Chinese)
Baidu: Paddle-Slim, Paddle-Mobile, Anakin
Microsoft: ELL, AutoML tool NNI
Facebook: Caffe2/PyTorch
Apple: CoreML (iOS 11+)
Google: ML-Kit, NNAPI (Android 8.1+), TF-Lite
Qualcomm: Snapdragon Neural Processing Engine (SNPE), Adreno GPU SDK
Huawei: HiAI
ARM: Tengine
Related: DAWNBench: An End-to-End Deep Learning Benchmark and Competition