prnvjb / Model-Compression-Acceleration

Paper list on model compression and acceleration

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Model-Compression-Acceleration

Papers

Quantization

  • Product Quantization for Nearest Neighbor Search,TPAMI,2011 [paper]
    • 介绍Product Quantization, 可以关注background部分
  • Compressing Deep Convolutional Networks using Vector Quantization,ICLR,2015 [paper]
    • 关于Vector Quantization早期比较有影响力的工作,用k-means学习centroids
  • Deep Learning with Limited Numerical Precision, ICML, 2015 [paper]
    • 在MNIST,CIFAR上进行16-bit fixed-point实验
  • Ristretto: Hardware-Oriented Approximation of Convolutional Neural Networks, ArXiv, 2016 [paper]
  • Fixed Point Quantization of Deep Convolutional Networks, ICML, 2016 [paper]
    • 推导quantization error在网络中的传播,根据这个error选取layer-wise的bit-width
  • Quantized Convolutional Neural Networks for Mobile Devices, CVPR, 2016 [paper]
    • Vector Quantization的一种,sub-vector和codebook近似
  • Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights, ICLR, 2017 [paper]
    • weights限制为0或2的幂,采用循环量化和逐步代偿的idea
  • BinaryConnect: Training Deep Neural Networks with binary weights during propagations, NIPS, 2015 [paper]
  • BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1, ArXiV, 2016 [paper]
  • XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks, ECCV, 2016 [paper]
    • Quantization中最极端的一种,只用1 bit表示数字
    • BinaryConnect中只有weights是二值化的,BNN中weights和activations都是二值化的,这两个实验都在小数据集上进行
    • XNOR-Net思路上跟BNN一致,但对layer加了scale补偿信息损失,在ImageNet上进行实验performance有10个点的损失。
  • Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations, ArXiv, 2016 [paper]
  • DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients, ArXiv, 2016 [paper]
    • 作为XNOR-Net上改进的工作,采用不同bit-width的weights和activations补偿信息损失,在ImageNet上1-bit weights和4-bit activations要比XNOR-Net好很多
    • 二值化网络的改进还有一些工作,如三值化等,此处不再一一列举

Pruning

  • Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding, ICLR, 2016 [paper]
    • ICLR'16 best paper,在model compression里较重要的工作,结合quantization和pruning,能把AlexNet压缩30多倍
  • Optimal Brain Damage, NIPS, 1990 [paper]
  • Learning both Weights and Connections for Efficient Neural Network, NIPS, 2015 [paper]
  • Pruning Filters for Efficient ConvNets, ICLR, 2017 [paper]
  • Sparsifying Neural Network Connections for Face Recognition, CVPR, 2016 [paper]
  • Learning Structured Sparsity in Deep Neural Networks, NIPS, 2016 [paper]
  • Pruning Convolutional Neural Networks for Resource Efficient Inference, ICLR, 2017 [paper]

Knowledge Distallation

  • Distilling the Knowledge in a Neural Network, ArXiv, 2015 [paper]
  • FitNets: Hints for Thin Deep Nets, ICLR, 2015 [paper]
  • Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer, ICLR, 2017 [paper]
  • Face Model Compression by Distilling Knowledge from Neurons, AAAI, 2016 [paper]
  • In Teacher We Trust: Learning Compressed Models for Pedestrian Detection, ArXiv, 2016 [paper]
  • Like What You Like: Knowledge Distill via Neuron Selectivity Transfer, ArXiv, 2017 [paper]

Network Architecture

  • SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5MB model size, ArXiv, 2016 [paper]
  • Convolutional Neural Networks at Constrained Time Cost, CVPR, 2015 [paper]
  • Flattened Convolutional Neural Networks for Feedforward Acceleration, ArXiv, 2014 [paper]
  • Going deeper with convolutions, CVPR, 2015 [paper]
  • Rethinking the Inception Architecture for Computer Vision, CVPR, 2016 [paper]
  • Design of Efficient Convolutional Layers using Single Intra-channel Convolution, Topological Subdivisioning and Spatial "Bottleneck" Structure, ArXiv, 2016 [paper]
  • Xception: Deep Learning with Depthwise Separable Convolutions, ArXiv, 2017 [paper]
  • MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications, ArXiv, 2017 [paper]
  • ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices, ArXiv, 2017 [paper]

Matrix Factorization(Low-rank Approximation)

严格来说Matrix Factorization在形式上应当属于Network Architecture的一种,但两条line出发点稍有不同,部分文章也很难严格区分属于哪一类,姑且如此列出

  • Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation, NIPS,2014 [paper]
  • Speeding up Convolutional Neural Networks with Low Rank Expansions, BMVC, 2014 [paper]
  • Deep Fried Convnets, ICCV, 2015 [paper]
  • Accelerating Very Deep Convolutional Networks for Classification and Detection, TPAMI, 2016 [paper]
  • Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition, ICLR, 2015 [paper]

About

Paper list on model compression and acceleration