chair-dsgt / mip-for-ann

Official PyTorch and CVXPY implementation of Identifying Critical Neurons in ANN Architectures using Mixed Integer Programming

Home Page:https://arxiv.org/abs/2002.07259

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Identify critical neurons using Mixed Integer Programming

A novel way of computing neuron importance score at fully connected / convolutional layers and using these computed scores to prune non-critical neurons with marginal loss in the accuracy without fine-tuning or retraining.

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.

Prerequisites

Install the requirements

pip3 install -r requirements.txt

MIP Solver

We use the Commercial MOSEK. In order to run experiments, a license file mosek.lic is required at location /root/mosek for Ubuntu. To use another solver, open sparsify_model.py script and change solver=cp.Mosek to the solver available in the following table (CVXPY is a Python-embedded modeling language for convex optimization problems on top of different solvers).

Available Solvers for CVXPY

LP QP SOCP SDP EXP MIP
CBC X X
GLPK X
GLPK_MI X X
OSQP X X
CPLEX X X X X
ECOS X X X X
ECOS_BB X X X X X
GUROBI X X X X
MOSEK X X X X X X
CVXOPT X X X X
SCS X X X X X

Running the experiments

All the experiments reported in the paper are in experiments_notebook.ipynb

Training Models

$ python3 train_model.py 

Arguments

  • -sd : parent directory to store logs and models
  • -e : number of training epochs
  • -dl : dataset index to be used for training with the following order ['MNIST', 'FashionMNIST', 'KMNIST', 'Caltech256', 'CIFAR10']
  • -r : number of training reset to train multiple models with different initializations
  • -m : model index with the following order ['FullyConnectedBaselineModel', 'FullyConnected2Model', 'Lecun Model 98', 'Dense Fully Connected', 'Lenet', 'vgg19']
  • -op : optimizer used for training with the following order ['Adam', 'SGD', 'RMSPROP']
  • -l : learning rate index with the following order ['1e-1', '1e-2', '1e-3', '1e-5']
  • -bs : batch size used during training
  • -dgl: to enable decoupled greedy learning during the training

Sparsifying Models

$ python3 run_sparsify.py

Arguments

starts with same arguments as training to select the right experiment directory with the following extra arguments:

  • -tt : pruning threshold (neurons having an importance score below the selected threshold are going to be pruned)
  • -sw : \lambda used to control loss on accuracy (more weight will prune less to keep predictive capacity)
  • -ft : flag to enable fine tuning after pruning
  • -te : number of fine tuning epochs
  • -n : number of data points as input to the MIP
  • -mth: a flag when enabled will use mean of layer's importance score as the pruning threshold
  • -f : a flag that forces re-computing the neuron importance score instead of using cached results from previous runs
  • -rl : a flag to relax ReLU constraints
  • -dgl: to use auxiliary networks trained per layer to compute neuron importance score for large models
  • -seq: a flag to run the MIP independently on each class then taking the average
  • -bll: a flag to run the MIP on each layer independently starting from the last layer

Sparsifying every n iterations/epochs

$ python3 train_sparsify.py

Arguments

Starts with same arguments as training and sparsify to select the right experiment directory with the following extra arguments

  • -trst : a flag to run sparsify every n iterations, if disabled will run every n epochs
  • -ent : an integer for n between epochs/iterations to apply sparsification
  • -incr : a flag to enable incremental training of computed sub-network

Robustness to different batches Experiments

$ python3  verify_selected_data.py

Arguments

Starts with same arguments as sparsifying models to plot the pruning percentage and the accuracy changes when the batch of images fed to the MIP changes.

Different Lambdas Experiments

$ python3 plot_different_lambdas

Arguments

Starts with same arguments as sparsifying models to plot the pruning percentage and the accuracy changes when the value of the \lambda (-sw) changes.

Average runs on different classes robustness

$ python3 batch_data_experiments.py

Arguments

Starts with same arguments as run_sparsify.py with the following extra arguments:

  • -nex: an integer denoting the number of experiments conducted
  • -bbm: a flag when enabled, we sample a balanced set of images per class, otherwise a random number of images per class is used
  • -ppexp: a flag when enabled the MIP runs independently per class and the neuron importance score becomes the average of multiple runs

References

@article{elaraby2020identifying,
  title={Identifying Critical Neurons in ANN Architectures using Mixed Integer Programming},
  author={ElAraby, Mostafa and Wolf, Guy and Carvalho, Margarida},
  journal={arXiv preprint arXiv:2002.07259},
  year={2020}
}
@article{mosek2010mosek,
  title={The MOSEK optimization software},
  author={Mosek, APS},
  journal={Online at http://www. mosek. com},
  volume={54},
  number={2-1},
  pages={5},
  year={2010}
}
@article{cvxpy,
  author  = {Steven Diamond and Stephen Boyd},
  title   = {{CVXPY}: A {P}ython-Embedded Modeling Language for Convex Optimization},
  journal = {Journal of Machine Learning Research},
  year    = {2016},
  volume  = {17},
  number  = {83},
  pages   = {1--5},
}
@article{cvxpy_rewriting,
  author  = {Akshay Agrawal and Robin Verschueren and Steven Diamond and Stephen Boyd},
  title   = {A Rewriting System for Convex Optimization Problems},
  journal = {Journal of Control and Decision},
  year    = {2018},
  volume  = {5},
  number  = {1},
  pages   = {42--60},
}

License

MIT license

About

Official PyTorch and CVXPY implementation of Identifying Critical Neurons in ANN Architectures using Mixed Integer Programming

https://arxiv.org/abs/2002.07259

License:MIT License


Languages

Language:Python 87.1%Language:Jupyter Notebook 12.9%