Caffe-GDP is a branch of Caffe, which adds a few lines of code in order to enable a global and dynamic filter pruning (GDP) on convolution layers of typical CNN architecture, as described in a newly accepted paper at IJCAI18, Accelerating Convolutional Networks via Global & Dynamic Filter Pruning. The paper mentioned is based on TensorFlow originally.
The following will introduce how GDP is implemented based on the original Caffe framework, then a guidance to perform GDP on a typical CNN. If you do not care about details, feel free to skip the first part.
New members added to the data structure are listed as below.
New Members | |
---|---|
Blob | |
vector<Dtype*> filter_contribution_2D_; | the channel-wise contribution of filters at the convolution layer |
vector<Dtype> filter_contrib_; | the contribution of filters at the convolution layer |
vector<int> filter_mask_; | the mask of filters at the convolution layer |
Net | |
vector<int> conv_layer_ids_; | the IDs of convolution layers in the net |
int num_filter_total_; | the total numbers of filters in the net |
vector<Dtype> filter_contrib_total_; | the collection of filter-wise contribution in the net |
BaseConvolutionLayer | |
shared_ptr<Blob<Dtype> > masked_weight_; | the masked weight blob which takes part in forward and backward |
Solver | |
is_pruning, etc | newly added super-parameters for GDP at caffe.proto |
Caffe-GDP's iteration is different that it updates the mask of weight blob according to the ranking of all filters' contribution right after backward propagation and masks the weight blob before forward operation of the next iteration.
Here are an introduction of newly added super-parameters at caffe.proto.
Super-Parameters | Meaning | Default |
---|---|---|
is_pruning | whether to perform GDP | false |
pruning_rate | the remaining proportion of filters after GDP | 1.0 |
mask_updating_step | total steps of mask updating interval | 1 |
mask_updating_stepsize | how often is the mask updating interval changed | 1000 |
mi_policy | the pattern of how the mask updating interval is changed ("exp"/"minus") | "minus" |
log_type | the pattern of how the mask is printed ("debug"/"release") | "debug" |
log_name | the pattern of how the log of printed mask is named | "mask.log" |
The following is a guideline to perform GDP to a typical CNN, taking LeNet5 as an example.
1 Firstly enter the root directory of Caffe-GDP and train the net from scratch as usual
./build/tools/caffe train -solver examples/mnist/lenet_solver.prototxt
2 Then turn on the GDP at prototxt and set the necessary parameters
./build/tools/caffe train -solver examples/mnist/lenet_solver_pruning.prototxt -weights examples/mnist/lenet_iter_10000.caffemodel
3 Run a Python script to cut the caffemodel aotomatically according to mask.log
python ./python/auto_caffemodel_pruning.py
4 (Optional) Fine-Tune
./build/tools/caffe train -solver examples/mnist/lenet_solver_finetune.prototxt -weights examples/mnist/lenet_iter_3000_pruned.caffemodel
Now when GDP is finished, we get a caffemodel of about 799kB (pruning_rate: 0.5), which is only 47.4% of the original 1684kB with an accuracy of 98.91% compared to 99.02%.
GDP is a learnable pruning method for the typical CNN architecture, which make the net much thinner and faster, while maintaining the original level of accuracy.
Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by Berkeley AI Research (BAIR)/The Berkeley Vision and Learning Center (BVLC) and community contributors.
Check out the project site for all the details like
- DIY Deep Learning for Vision with Caffe
- Tutorial Documentation
- BAIR reference models and the community model zoo
- Installation instructions
and step-by-step examples.
- Intel Caffe (Optimized for CPU and support for multi-node), in particular Xeon processors (HSW, BDW, SKX, Xeon Phi).
- OpenCL Caffe e.g. for AMD or Intel devices.
- Windows Caffe
Please join the caffe-users group or gitter chat to ask questions and talk about methods and models. Framework development discussions and thorough bug reports are collected on Issues.
Happy brewing!
Caffe is released under the BSD 2-Clause license. The BAIR/BVLC reference models are released for unrestricted use.
Please cite Caffe in your publications if it helps your research:
@article{jia2014caffe,
Author = {Jia, Yangqing and Shelhamer, Evan and Donahue, Jeff and Karayev, Sergey and Long, Jonathan and Girshick, Ross and Guadarrama, Sergio and Darrell, Trevor},
Journal = {arXiv preprint arXiv:1408.5093},
Title = {Caffe: Convolutional Architecture for Fast Feature Embedding},
Year = {2014}
}