DeepShift

This is project is the implementation of the DeepShift: Towards Multiplication-Less Neural Networks paper, that aims to replace multiplications in a neural networks with bitwise shift (and sign change).

This research project was done at Huawei Technologies.

Overview

The main idea of DeepShift is to test the ability to train and infer using bitwise shifts.

We present 2 approaches:

DeepShift-Q: the parameters are floating point weights just like regular networks, but the weights are rounded to powers of 2 during the forward and backward passes
DeepShift-PS: the parameters are signs and shift values

Important Notes

To train from scratch, the learning rate --lr option should be set to 0.01. To train from pre-trained model, it should be set to 0.001 and lr-step-size should be set to 5
To use DeepShift-PS, the --optimizer must be set to radam in order to obtain good results.

Getting Started

Clone the repo:

git clone https://github.com/mostafaelhoushi/DeepShift.git

Change directory

cd DeepShift

Create virtual environment:

virtualenv venv --prompt="(DeepShift) " --python=/usr/bin/python3.6

(Needs to be done every time you run code) Source the environment:

source venv/bin/activate

Install required packages and build the spfpm package for fixed point

pip install -r requirements.txt

cd into pytorch directroy:

cd pytorch

Now you can run the different scripts with different options, e.g., a) Train a DeepShift simple fully-connected model on the MNIST dataset, using the PS apprach:
```
python mnist.py --shift-depth 3 --shift-type PS --optimizer radam
```
b) Train a DeepShift simple convolutional model on the MNIST dataset, using the Q approach:
```
python mnist.py --type conv --shift-depth 3 --shift-type Q 
```
c) Train a DeepShift ResNet20 on the CIFAR10 dataset from scratch:
```
python cifar10.py --arch resnet20 --pretrained False --shift-depth 1000 --shift-type Q 
```
d) Train a DeepShift ResNet18 model on the Imagenet dataset using converted pretrained weights for 5 epochs with learning rate 0.001:
```
python imagenet.py <path to imagenet dataset> --arch resnet18 --pretrained True --shift-depth 1000 --shift-type Q --epochs 5 --lr 0.001
```
e) Train a DeepShift ResNet18 model on the Imagenet dataset from scratch with an initial learning rate of 0.01:
```
python imagenet.py <path to imagenet dataset> --arch resnet18 --pretrained False --shift-depth 1000 --shift-type PS --optimizer radam --lr 0.01
```

Running the Bitwise Shift CUDA & CPU Kernels

cd into DeepShift/pytorch directroy:

cd DeepShift/pytorch

Run the installation script to install our CPU and CUDA kernels that perform matrix multiplication and convolution using bit-wise shifts:

sh install_kernels.sh

Now you can run a model with acutal bit-wise shift kernels in CUDA using the --use-kernel True option. Remember that the kernel only works for inference not training, so you need to add the -e option as well:

python imagenet.py --arch resnet18 -e --shift-depth 1000 --pretrained True --use-kernel True

To compare the latency with a naive regular convolution kernel that does not include cuDNN's other optimizations:

python imagenet.py --arch resnet18 -e --pretrained True --use-kernel True

Results

MNIST

Train from Scratch

Model	Original	DeepShift-Q	DeepShift-PS
Simple FC Model	96.92%	97.03%	98.26%
Simple Conv Model	98.75%	98.81%	99.12%

Train from Pre-Trained

Model	Original	DeepShift-Q	DeepShift-PS
Simple FC Model	96.92%	94.91%	98.26%
Simple Conv Model	98.75%	99.15%	99.16%

CIFAR10

Train from Scratch

Model	Original	DeepShift-Q	DeepShift-PS
ResNet18	94.45%	94.42%	93.20%
MobileNetv2	93.57%	93.63%	92.64%
ResNet20	91.79%	89.85%	88.84%
ResNet32	92.39%	91.13%	89.97%
ResNet44	92.84%	91.29%	90.92%
ResNet56	93.46%	91.52%	91.11%

Train from Pre-Trained

Model	Original	DeepShift-Q	DeepShift-PS
ResNet18	94.45%	94.25%	94.12%
MobileNetv2	93.57%	93.04%	92.78%

Using Fewer Bits

Model	Type	Weight Bits	Train from Scratch	Train from Pre-Trained
ResNet18	Original	32	94.45%	-
ResNet18	DeepShift-PS	5	93.20%	94.12%
ResNet18	DeepShift-PS	4	94.12%	94.13%
ResNet18	DeepShift-PS	3	92.85%	91.16%
ResNet18	DeepShift-PS	2	92.80%	90.68%

ImageNet

Accuracies shown are Top1 / Top5.

Train from Scratch

Model	Original	DeepShift-Q	DeepShift-PS
ResNet18	69.76% / 89.08%	65.32% / 86.29%	65.34% / 86.05%
ResNet50	76.13% / 92.86%	70.70% / 90.20%	71.90% / 90.20%
VGG16	71.59% / 90.38%	70.87% / 90.09%	TBD

Train from Pre-Trained

Model	Original	DeepShift-Q	DeepShift-PS
ResNet18	69.76% / 89.08%	69.56% / 89.17%	69.27% / 89.00%
ResNet50	76.13% / 92.86%	76.33% / 93.05%	75.93% / 92.90%
GoogleNet	69.78% / 89.53%	71.56% / 90.48%	71.39% / 90.33%
VGG16	71.59% / 90.38%	71.56% / 90.48%	71.39% / 90.33%
AlexNet	56.52% / 79.07%	55.81% / 78.79%	55.90% / 78.73%
DenseNet121	74.43% / 91.97%	74.52% / 92.06%	TBD

Using Fewer Bits

Model	Type	Weight Bits	Train from Scratch	Train from Pre-Trained
ResNet18	Original	32	69.76% / 89.08%	-
ResNet18	DeepShift-Q	5	65.34% / 86.05%	69.56% / 89.17%
ResNet18	DeepShift-PS	5	65.34% / 86.05%	69.27% / 89.00%
ResNet18	DeepShift-Q	4	TBD	69.56% / 89.14%
ResNet18	DeepShift-PS	4	67.07% / 87.36%	69.02% / 88.73%
ResNet18	DeepShift-PS	3	63.11% / 84.45%	TBD
ResNet18	DeepShift-PS	2	60.80% / 83.01%	TBD

Binary Files of Trained Models

TBD

Code WalkThrough

pytorch: directory containing implementation, tests, and saved models using PyTorch
- deepshift: directory containing the PyTorch models as well as the CUDA and CPU kernels of LinearShift and Conv2dShift ops
- unoptimized: directory containing the PyTorch models as well as the CUDA and CPU kernels of the naive implementations of Linear and Conv2d ops
- mnist.py: example script to train and infer on MNIST dataset using simple models in both their original forms and DeepShift version.
- cifar10.py: example script to train and infer on CIFAR10 dataset using various models in both their original forms and DeepShift version.
- imagenet.py: example script to train and infer on Imagenet dataset using various models in both their original forms and DeepShift version.
- optim: directory containing definition of RAdam and Ranger optimizers. RAdam optimizer is crucial to get DeepShift-PS obtain the accuracies shown here

About

Implementation of "DeepShift: Towards Multiplication-Less Neural Networks" https://arxiv.org/abs/1905.13298

Languages

Language:Python 79.1%Language:Cuda 13.8%Language:C++ 7.1%Language:Shell 0.1%

CJYLab / DeepShift

DeepShift

Table of Contents

Overview

Important Notes

Getting Started

Running the Bitwise Shift CUDA & CPU Kernels

Results

MNIST

Train from Scratch

Train from Pre-Trained

CIFAR10

Train from Scratch

Train from Pre-Trained

Using Fewer Bits

ImageNet

Train from Scratch

Train from Pre-Trained

Using Fewer Bits

Binary Files of Trained Models

Code WalkThrough

About

Languages