Caffe-GDP

Caffe-GDP is a branch of Caffe, which adds a few lines of code in order to enable a global and dynamic filter pruning (GDP) on convolution layers of typical CNN architecture, as described in a newly accepted paper at IJCAI18, Accelerating Convolutional Networks via Global & Dynamic Filter Pruning. The paper mentioned is based on TensorFlow originally.

The following will introduce how GDP is implemented based on the original Caffe framework, then a guidance to perform GDP on a typical CNN. If you do not care about details, feel free to skip the first part.

Inplementation

New members added to the data structure are listed as below.

New Members
Blob
vector<Dtype*> filter_contribution_2D_;	the channel-wise contribution of filters at the convolution layer
vector<Dtype> filter_contrib_;	the contribution of filters at the convolution layer
vector<int> filter_mask_;	the mask of filters at the convolution layer
Net
vector<int> conv_layer_ids_;	the IDs of convolution layers in the net
int num_filter_total_;	the total numbers of filters in the net
vector<Dtype> filter_contrib_total_;	the collection of filter-wise contribution in the net
BaseConvolutionLayer
shared_ptr<Blob<Dtype> > masked_weight_;	the masked weight blob which takes part in forward and backward
Solver
is_pruning, etc	newly added super-parameters for GDP at caffe.proto

Caffe-GDP's iteration is different that it updates the mask of weight blob according to the ranking of all filters' contribution right after backward propagation and masks the weight blob before forward operation of the next iteration.

Instruction

Here are an introduction of newly added super-parameters at caffe.proto.

Super-Parameters	Meaning	Default
is_pruning	whether to perform GDP	false
pruning_rate	the remaining proportion of filters after GDP	1.0
mask_updating_step	total steps of mask updating interval	1
mask_updating_stepsize	how often is the mask updating interval changed	1000
mi_policy	the pattern of how the mask updating interval is changed ("exp"/"minus")	"minus"
log_type	the pattern of how the mask is printed ("debug"/"release")	"debug"
log_name	the pattern of how the log of printed mask is named	"mask.log"

The following is a guideline to perform GDP to a typical CNN, taking LeNet5 as an example.

1 Firstly enter the root directory of Caffe-GDP and train the net from scratch as usual

./build/tools/caffe train -solver examples/mnist/lenet_solver.prototxt

2 Then turn on the GDP at prototxt and set the necessary parameters

./build/tools/caffe train -solver examples/mnist/lenet_solver_pruning.prototxt -weights examples/mnist/lenet_iter_10000.caffemodel

3 Run a Python script to cut the caffemodel aotomatically according to mask.log

python ./python/auto_caffemodel_pruning.py

4 (Optional) Fine-Tune

./build/tools/caffe train -solver examples/mnist/lenet_solver_finetune.prototxt -weights examples/mnist/lenet_iter_3000_pruned.caffemodel

Now when GDP is finished, we get a caffemodel of about 799kB (pruning_rate: 0.5), which is only 47.4% of the original 1684kB with an accuracy of 98.91% compared to 99.02%.

GDP is a learnable pruning method for the typical CNN architecture, which make the net much thinner and faster, while maintaining the original level of accuracy.

Caffe

Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by Berkeley AI Research (BAIR)/The Berkeley Vision and Learning Center (BVLC) and community contributors.

Check out the project site for all the details like

and step-by-step examples.

Custom distributions

Intel Caffe (Optimized for CPU and support for multi-node), in particular Xeon processors (HSW, BDW, SKX, Xeon Phi).
OpenCL Caffe e.g. for AMD or Intel devices.
Windows Caffe

Community

Please join the caffe-users group or gitter chat to ask questions and talk about methods and models. Framework development discussions and thorough bug reports are collected on Issues.

Happy brewing!

License and Citation

Caffe is released under the BSD 2-Clause license. The BAIR/BVLC reference models are released for unrestricted use.

Please cite Caffe in your publications if it helps your research:

@article{jia2014caffe,
  Author = {Jia, Yangqing and Shelhamer, Evan and Donahue, Jeff and Karayev, Sergey and Long, Jonathan and Girshick, Ross and Guadarrama, Sergio and Darrell, Trevor},
  Journal = {arXiv preprint arXiv:1408.5093},
  Title = {Caffe: Convolutional Architecture for Fast Feature Embedding},
  Year = {2014}
}

About

personal caffe version which enables global and dynamic pruning of CNN

Other

Languages

Language:C++ 80.0%Language:Python 9.2%Language:Cuda 5.9%Language:CMake 2.8%Language:MATLAB 0.9%Language:Makefile 0.7%Language:Shell 0.4%Language:Dockerfile 0.1%