There are 2 repositories under efficient-inference topic.
Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.
LLMCompiler: An LLM Compiler for Parallel Function Calling
Code for paper " AdderNet: Do We Really Need Multiplications in Deep Learning?"
EfficientFormerV2 [ICCV 2023] & EfficientFormer [NeurIPs 2022]
Learning Efficient Convolutional Networks through Network Slimming, In ICCV 2017.
SqueezeLLM: Dense-and-Sparse Quantization
"LightGaussian: Unbounded 3D Gaussian Compression with 15x Reduction and 200+ FPS", Zhiwen Fan, Kevin Wang, Kairun Wen, Zehao Zhu, Dejia Xu, Zhangyang Wang
List of papers related to neural network quantization in recent AI conferences and journals.
[CVPR 2021] Exploring Sparsity in Image Super-Resolution for Efficient Inference
(CVPR 2021, Oral) Dynamic Slimmable Network
[ECCV2022] Efficient Long-Range Attention Network for Image Super-resolution
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
Explorations into some recent techniques surrounding speculative decoding
[NeurIPS'23] Speculative Decoding with Big Little Decoder
[ICLR 2022] Code for Graph-less Neural Networks: Teaching Old MLPs New Tricks via Distillation (GLNN)
Implementation of AAAI 21 paper: Nested Named Entity Recognition with Partially Observed TreeCRFs
Jia-Hong Lee, Yi-Ming Chan, Ting-Yen Chen, and Chu-Song Chen, "Joint Estimation of Age and Gender from Unconstrained Face Images using Lightweight Multi-task CNN for Mobile Applications," IEEE International Conference on Multimedia Information Processing and Retrieval, MIPR 2018
Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)
Concise, Modular, Human-friendly PyTorch implementation of EfficientNet with Pre-trained Weights.
Code for WF-IoT paper 'TinyML Benchmark: Executing Fully Connected Neural Networks on Commodity Microcontrollers'
Code for paper 'Multi-Component Optimization and Efficient Deployment of Neural-Networks on Resource-Constrained IoT Hardware'
[ICLR 2020] ”Triple Wins: Boosting Accuracy, Robustness and Efficiency Together by Enabling Input-Adaptive Inference“
Yi-Min Chou, Yi-Ming Chan, Jia-Hong Lee, Chih-Yi Chiu, Chu-Song Chen, "Unifying and Merging Well-trained Deep Neural Networks for Inference Stage," International Joint Conference on Artificial Intelligence (IJCAI), 2018
Cheng-Hao Tu, Jia-Hong Lee, Yi-Ming Chan and Chu-Song Chen, "Pruning Depthwise Separable Convolutions for MobileNet Compression," International Joint Conference on Neural Networks, IJCNN 2020, July 2020.
[ICML 2023] Linkless Link Prediction via Relational Distillation