efficient-inference

There are 2 repositories under efficient-inference topic.

huawei-noah / Efficient-AI-Backbones
Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.
convolutional-neural-networks efficient-inference imagenet model-compression tensorflow pytorch ghostnet transformer pretrained-models vision-transformer
Language:Python 3933
SqueezeAILab / LLMCompiler
[ICML 2024] LLMCompiler: An LLM Compiler for Parallel Function Calling
function-calling llm llm-agent llm-agents llms parallel-function-call efficient-inference large-language-models llama llama2 llm-framework natural-language-processing nlp transformer
Language:Python 1287
snap-research / EfficientFormer
EfficientFormerV2 [ICCV 2023] & EfficientFormer [NeurIPs 2022]
deep-learning detection efficient-inference efficient-neural-networks pytorch semantic-segmentation transformer imagenet transformers mobile-devices
Language:Python 966
huawei-noah / AdderNet
Code for paper " AdderNet: Do We Really Need Multiplications in Deep Learning?"
pytorch imagenet convolutional-neural-networks cvpr2020 efficient-inference
Language:Python 953
horseee / DeepCache
[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free
diffusion-models efficient-inference model-compression stable-diffusion training-free
Language:Python 711
SqueezeAILab / SqueezeLLM
[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization
efficient-inference large-language-models llm model-compression natural-language-processing post-training-quantization quantization text-generation transformer llama localllm small-models
Language:Python 607
liuzhuang13 / slimming
Learning Efficient Convolutional Networks through Network Slimming, In ICCV 2017.
deep-learning convolutional-neural-networks efficient-inference
Language:Lua 553
VITA-Group / LightGaussian
"LightGaussian: Unbounded 3D Gaussian Compression with 15x Reduction and 200+ FPS", Zhiwen Fan, Kevin Wang, Kairun Wen, Zehao Zhu, Dejia Xu, Zhangyang Wang
3d-reconstruction efficient-inference gaussian-splatting
Language:Python 517
Zhen-Dong / Awesome-Quantization-Papers
List of papers related to neural network quantization in recent AI conferences and journals.
awesome-list diffusion-models edge-computing efficient-inference large-language-models model-compression neural-networks papers quantization
379
SqueezeAILab / KVQuant
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
compression efficient-inference efficient-model large-language-models llama llm localllama localllm mistral model-compression natural-language-processing quantization small-models text-generation transformer
Language:Python 244
The-Learning-And-Vision-Atelier-LAVA / SMSR
[CVPR 2021] Exploring Sparsity in Image Super-Resolution for Efficient Inference
super-resolution sparsity efficient-inference
Language:Python 236
changlin31 / DS-Net
(CVPR 2021, Oral) Dynamic Slimmable Network
dynamic-networks pruning network-pruning dynamic-pruning model-compression efficient-inference
Language:Python 225
xindongzhang / ELAN
[ECCV2022] Efficient Long-Range Attention Network for Image Super-resolution
efficient-inference super-resolution transformer
Language:Python 196
liuziwei7 / mobile-id
Deep Face Model Compression
computer-vision deep-learning face-recognition model-compression efficient-inference
Language:MATLAB 195
lucidrains / speculative-decoding
Explorations into some recent techniques surrounding speculative decoding
artificial-intelligence deep-learning efficient-inference transformers
Language:Python 177
cure-lab / DeciWatch
[ECCV 2022] Official implementation of the paper "DeciWatch: A Simple Baseline for 10x Efficient 2D and 3D Pose Estimation"
2d-human-pose 3d-pose-estimation body-reconstruction efficient-inference human-pose-estimation 3d-body-recovery deep-learning efficiency efficient-neural-networks pose-estimation pytorch eccv eccv2022
Language:Python 171
Picovoice / picollm
On-device LLM Inference Powered by X-Bit Quantization
llm compression efficient-inference gemma generative-ai language-model language-models large-language-model llama llama2 llama3 llms mistral mixtral model-compression natural-language-processing quantization self-hosted llm-inference
Language:Python 133
czg1225 / AsyncDiff
Official implementation of "AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising"
diffusion-models distributed-computing inference-acceleration training-free efficient-inference stable-diffusion text-to-image text-to-video
Language:Python 123
RAIVNLab / STR
Soft Threshold Weight Reparameterization for Learnable Sparsity
sparsity learnable-sparsity sparsity-optimization cnn efficient-inference edge-machine-learning soft-thresholding str imagenet resource-efficient icml-2020 icml icml2020 soft-threshold-reparameterization
Language:Python 86
snap-research / graphless-neural-networks
[ICLR 2022] Code for Graph-less Neural Networks: Teaching Old MLPs New Tricks via Distillation (GLNN)
deep-learning distillation efficient-inference graph-algorithm graph-neural-networks knowledge-distillation pytorch gnn scalability
Language:Python 82
kssteven418 / BigLittleDecoder
[NeurIPS'23] Speculative Decoding with Big Little Decoder
decoding efficient-inference fast-inference llm speculative-execution speculative-decoding
Language:Python 81
FranxYao / Partially-Observed-TreeCRFs
Implementation of AAAI 21 paper: Nested Named Entity Recognition with Partially Observed TreeCRFs
crf named-entity-recognition efficient-inference nested-named-entity-recognition tree-crf tree-structure sum-product-algorithm sum-product
Language:Python 52
IBM / AdaMML
Official implementation of AdaMML. https://arxiv.org/abs/2105.05165.
computer-vision multimodal-learning deep-learning efficient-inference
Language:Python 49
horseee / learning-to-cache
Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching
diffusion-models efficient-inference
Language:Python 48
raymin0223 / fast_robust_early_exit
Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)
autoregressive-models early-exiting efficient-inference nlp llms
Language:Python 47
tchittesh / lzu
Code for Learning to Zoom and Unzoom (CVPR 2023)
3d-detection autonomous-driving efficient-inference spatial-attention
Language:Python 46
ivclab / agegenderLMTCNN
Jia-Hong Lee, Yi-Ming Chan, Ting-Yen Chen, and Chu-Song Chen, "Joint Estimation of Age and Gender from Unconstrained Face Images using Lightweight Multi-task CNN for Mobile Applications," IEEE International Conference on Multimedia Information Processing and Retrieval, MIPR 2018
age-gender-cnn mobile-application multi-task-learning tensorflow android-application efficient-inference deep-neural-networks
Language:Python 40
yikaiw / RS-Nets
[ECCV 2020] Code release for "Resolution Switchable Networks for Runtime Efficient Image Recognition"
eccv2020 rsnet multi-resolution ensemble distillation quantization efficient-inference switchable
Language:Python 40
bharathsudharsan / TinyML-Benchmark-NNs-on-MCUs
Code for WF-IoT paper 'TinyML Benchmark: Executing Fully Connected Neural Networks on Commodity Microcontrollers'
tinyml-benchmark raspberry-pi-pico mcu-boards arduinio armcortexm0 armcortexm4 armcortexm7 machine-learning tinyml efficient-inference tflite tfmicro c-code-generator cmsis-nn
Language:Python 31
linksense / EfficientNet.PyTorch
Concise, Modular, Human-friendly PyTorch implementation of EfficientNet with Pre-trained Weights.
efficientnet efficient-model efficient-inference pytorch efficientnet-pytorch efficientnet-pretrained efficientseg imagenet pretrained-weights
Language:Python 31
Zhen-Dong / CoDeNet
[FPGA'21] CoDeNet is an efficient object detection model on PyTorch, with SOTA performance on VOC and COCO based on CenterNet and Co-Designed deformable convolution.
fpgas pretrained-models quantization deformable-convnets centernet pytorch efficient-inference detector efficient object-detection
Language:Python 26
bharathsudharsan / CNN_on_MCU
Code for paper 'Multi-Component Optimization and Efficient Deployment of Neural-Networks on Resource-Constrained IoT Hardware'
optimization quantization-aware-training quantization graph-optimization tflite tflite-conversion tinyml cmsis-nn efficient-inference edge-computing neuralnetworks c-code-generator
Language:Jupyter Notebook 24
VITA-Group / triple-wins
[ICLR 2020] ”Triple Wins: Boosting Accuracy, Robustness and Efficiency Together by Enabling Input-Adaptive Inference“
adversarial-robustness adversarial-attacks triple-wins efficiency robustness efficient-inference
Language:Python 24
ivclab / NeuralMerger
Yi-Min Chou, Yi-Ming Chan, Jia-Hong Lee, Chih-Yi Chiu, Chu-Song Chen, "Unifying and Merging Well-trained Deep Neural Networks for Inference Stage," International Joint Conference on Artificial Intelligence (IJCAI), 2018
deep-neural-networks multi-task-learning cnn-compression unifying-and-merging-cnn efficient-inference tensorflow multi-modal-learning
Language:Python 20
xternalz / SDPoint
Stochastic Downsampling for Cost-Adjustable Inference and Improved Regularization in Convolutional Networks
convolutional-networks resnets resnet efficient-inference pooling downsampling convolutional-neural-networks efficient-model deep-neural-networks deep-learning deep-learning-algorithms computer-vision regularization cost-adjustable resnext preact-resnet imagenet imagenet-dataset batch-normalization batchnorm
Language:Python 18
snap-research / linkless-link-prediction
[ICML 2023] Linkless Link Prediction via Relational Distillation
deep-learning distillation efficient-inference gnn graph-neural-networks knowledge-distillation link-prediction scalability
Language:Python 17

efficient-inference

huawei-noah / Efficient-AI-Backbones

SqueezeAILab / LLMCompiler

snap-research / EfficientFormer

huawei-noah / AdderNet

horseee / DeepCache

SqueezeAILab / SqueezeLLM

liuzhuang13 / slimming

VITA-Group / LightGaussian

Zhen-Dong / Awesome-Quantization-Papers

SqueezeAILab / KVQuant

The-Learning-And-Vision-Atelier-LAVA / SMSR

changlin31 / DS-Net

xindongzhang / ELAN

liuziwei7 / mobile-id

lucidrains / speculative-decoding

cure-lab / DeciWatch

Picovoice / picollm

czg1225 / AsyncDiff

RAIVNLab / STR

snap-research / graphless-neural-networks

kssteven418 / BigLittleDecoder

FranxYao / Partially-Observed-TreeCRFs

IBM / AdaMML

horseee / learning-to-cache

raymin0223 / fast_robust_early_exit

tchittesh / lzu

ivclab / agegenderLMTCNN

yikaiw / RS-Nets

bharathsudharsan / TinyML-Benchmark-NNs-on-MCUs

linksense / EfficientNet.PyTorch

Zhen-Dong / CoDeNet

bharathsudharsan / CNN_on_MCU

VITA-Group / triple-wins

ivclab / NeuralMerger

xternalz / SDPoint

snap-research / linkless-link-prediction