efficient-neural-networks deep-neural-networks deep-learning mobile-ai embedded-ai mobile-inference mobile-deep-learning pruning quantization neural-network-compression inference

Awesome EMDL

Embedded and mobile deep learning research notes.

Papers

Quantization

Pruning

Awesome-Pruning [Repo]
Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration [CVPR'19]
To prune, or not to prune: exploring the efficacy of pruning for model compression [ICLR'18]
Pruning Filters for Efficient ConvNets [ICLR'17]
Pruning Convolutional Neural Networks for Resource Efficient Inference [ICLR'17]
Soft Weight-Sharing for Neural Network Compression [ICLR'17]
Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning [CVPR'17]
ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression [ICCV'17]
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding [ICLR'16]
Dynamic Network Surgery for Efficient DNNs [NIPS'16]
Learning both Weights and Connections for Efficient Neural Networks [NIPS'15]

Approximation

High performance ultra-low-precision convolutions on mobile devices [NIPS'17]
Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications [ICLR'16]
Efficient and Accurate Approximations of Nonlinear Convolutional Networks [CVPR'15]
Accelerating Very Deep Convolutional Networks for Classification and Detection (Extended version of above one)
Convolutional neural networks with low-rank regularization [arXiv'15]
Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation [NIPS'14]

Characterization

Libraries

Inference Framework

Alibaba - MNN - is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba.
Apple - CoreML - is integrate machine learning models into your app. BERT and GPT-2 on iPhone
Arm - ComputeLibrary - is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies. Intro
Arm - Arm NN - is the most performant machine learning (ML) inference engine for Android and Linux, accelerating ML on Arm Cortex-A CPUs and Arm Mali GPUs.
Baidu - Paddle Lite - is multi-platform high performance deep learning inference engine.
DeepLearningKit - is Open Source Deep Learning Framework for Apple's iOS, OS X and tvOS.
Edge Impulse - Interactive platform to generate models that can run in microcontrollers. They are also quite active on social netwoks talking about recent news on EdgeAI/TinyML.
Google - TensorFlow Lite - is an open source deep learning framework for on-device inference.
Intel - OpenVINO - Comprehensive toolkit to optimize your processes for faster inference.
JDAI Computer Vision - dabnn - is an accelerated binary neural networks inference framework for mobile platform.
Meta - PyTorch Mobile - is a new framework for helping mobile developers and machine learning engineers embed PyTorch ML models on-device.
Microsoft - DeepSpeed - is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Microsoft - ELL - allows you to design and deploy intelligent machine-learned models onto resource constrained platforms and small single-board computers, like Raspberry Pi, Arduino, and micro:bit.
Microsoft - ONNX RUntime - cross-platform, high performance ML inferencing and training accelerator.
Nvidia - TensorRT - is a C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators.
OAID - Tengine - is a lite, high performance, modular inference engine for embedded device
Qualcomm - Neural Processing SDK for AI - Libraries to developers run NN models on Snapdragon mobile platforms taking advantage of the CPU, GPU and/or DSP.
Tencent - ncnn - is a high-performance neural network inference framework optimized for the mobile platform.
uTensor - AI inference library based on mbed (an RTOS for ARM chipsets) and TensorFlow.
XiaoMi - Mace - is a deep learning inference framework optimized for mobile heterogeneous computing platforms.
xmartlabs - Bender - Easily craft fast Neural Networks on iOS! Use TensorFlow models. Metal under the hood.

Optimization Tools

Neural Network Distiller - Python package for neural network compression research.
PocketFlow - An Automatic Model Compression (AutoMC) framework for developing smaller and faster AI applications.