datur / nncompression

Masters Thesis Project https://davidturner94.github.com/nncompression

neural-network neural-network-compression

Masters Thesis

TODO

Finish moving previous folder into new module
Find a way to programatically use openvino model optimizer - should it be through an include or a sym link?
write methods for compressing networks
explore NCS2 python tools for model performance statistics

Neural Network Compression

Tools

Model Description Libraries

Library	Maintainer	Desc	Status
PyTorch	Facebook	Facebook backed and Open Source. Dynamic Computational graph	Actively Developed
Tensorflow	Google	Google Backed. Tensorflow Lite for Mobile Deployment. Python and C++. Static graph	Actively Developed
Keras	François Chollet/Google	Google Employee François Cholle developed. Great for quick prototyping. Wrapper for Tensorflow, Theano, CNTK. It is tightly integrated with tensorflow 2	Active
CNTK	Microsoft	toolkit that describes neural networks as a series of computational steps via a directed graph	Depreciated
ONNX	Joint Venture between Facebook and Microsoft.	Wide spread enterprise use. Multiple libraries convert to universal Model. Interchangable AI Models	Actively Developed
MxNet	Apache	Has wide Industry support and a large community of users. Focus on scalability over multiple GPUs and portability with large number of languages supported and most major OS supported. Has libraries expanding on core functionality for NLP and CV Repo	Actively Developed
Chainer	Community, Preferred Networks, Inc.	open source deep learning framework written purely in Python on top of Numpy and CuPy. First to use "define-by-run"(dynamic computational graph) and focuses on object oriented design for defining neural networks.	Actively Developed
Caffe	UC Berkley	expression, Focus on speed, and modularity. Last stable release over 2 years ago. However, it is still supported by many pruning and deployment libraries so still used in industry.	Depreciated
Caffe2	Facebook - Merged with PyTorch as of 2018	Focus on light-weight and modular. Caffe2go for mobile deployment.	Depreciated
Theano	Montreal Institute for Learning Algorithms, University of Montreal	Focus on ability to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. Last Release 2017 only just being phased out in major libraries backends.	Depreciated

Compression Libraries and extensions

Tool	Maintainer	Desc
Brevitas	Xilinx Research	Xilinx Research team Neural Network Quantization Framework from Xilinx Research FINN project originally built on theano, now being migrated to PyTorch.
Vitis	Xilinx	Xilinx commercial software Suite with "Vitis AI" Library supporting caffe and tensorflow and potentially pytorch. including AI Optimizer module for pruning, AI Quantizer for Quantizing, and AI Compiler for optimising code for DPU (Deep Learning Processing Unit) a layer on top of the bare metal FPGA.
Intel Distiller	Intel	Compression Library ontop of PyTorch. State of the art algorithms. Includes Pruning, Quantisation, Regularization, Knowledge Distilation, Conditional Computation
Keras-Surgeon	Ben Whetton	Developed by Ben Whetton. Pruining for keras models. Last updated a year ago
QNNPACK	Facebook	A mobile-optimized library for low-precision high-performance neural network inference. Focus on Quantization. Intended Not for research but for high level frameworks
MXNet Contrib Quantization	Apache/Community	MxNet Contrib library for Quantization.
tensorflow_model_optimization	Google	Includes Model optimization techniques using Quantization and Pruning APIs
Chainer Pruner	tkat0	Channel Pruning form chainer
Tensorflow Lite	Google	More geared towards Mobile deployment. Model Optimization through Quantization. Two in components: TF Lite interpreter and TF Lite Converter which converts tensorflow models to tflite optimized models.
Optuna	Optuna	Automatic hyperparameter optimization software framework.

Deployment Tools

Tool	Maintainer	Desc
TVM Neural Network Compiler Stack	Apache	Relay is a Intermediate Representation Library for building dataflow computational graphs. VTA A programmable accelerator and an end-to-end solution that includes drivers, a JIT runtime, and an optimizing compiler stack based on TVM. Includes deployment and simulation tools for FPGAs
Glow	Facebook	Pytorch Compiler
OpenVino	Intel	Deployment and model optimization for FPGA
FINN-HLS	Xilinx Research	Vivaldo For FINN
Vitis	Xilinx	Commercial Software stack including Vivaldo for simulation and deployment

References

Papers Im Reading

Papers im Reading

Papers on pruning

List of papers on pruning: Link Other Collection of compression techniques link

About

Masters Thesis Project https://davidturner94.github.com/nncompression

neural-network neural-network-compression

Languages

Language:TeX 75.1%Language:Python 24.9%