Deficient Linear Transforms for Efficient Deep Learning

Substitute compressed linear transforms for deep learning. Substitute convolutions into an existing WideResNet or DARTS network and train as normal. Details of the research are provided in the research log.

tl;dr

In a deep neural network, you can replace the matrix multiply using a weight matrix (a linear transform) with an alternative that uses fewer parameters or mult-adds or both. Such as:

But, this will only train if you scale the original weight decay used to train the network by the compression ratio.

WRN-28-10 on CIFAR-10

DARTS on CIFAR-10

WRN-50-2 on ImageNet

Citations

If you would like to cite this work, please cite our paper using the following bibtex entry:

@article{gray2019separable,
  author    = {Gavin Gray and
               Elliot J. Crowley and
               Amos Storkey},
  title     = {Separable Layers Enable Structured Efficient Linear
Substitutions},
  journal   = {CoRR},
  volume    = {abs/1906.00859}, pending
  year      = {2019},
  url       = {https://arxiv.org/abs/1906.00859},
  archivePrefix = {arXiv}
  eprint    = {1906.00859}
}

Acknowledgements

Based on: https://github.com/BayesWatch/pytorch-moonshine

About

Successfully training approximations to full-rank matrices for efficiency in deep learning.

MIT License

Languages

Language:Python 100.0%