There are 4 repositories under inference-optimization topic.
Batch normalization fusion for PyTorch
[MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration
Optimize layers structure of Keras model to reduce computation time
A set of tool which would make your life easier with Tensorrt and Onnxruntime. This Repo is designed for YoloV3
A constrained expectation-maximization algorithm for feasible graph inference.
Batch Partitioning for Multi-PE Inference with TVM (2020)
🤖️ Optimized CUDA Kernels for Fast MobileNetV2 Inference
MIVisionX Python Inference Analyzer uses pre-trained ONNX/NNEF/Caffe models to analyze inference results and summarize individual image results
Learn the ins and outs of efficiently serving Large Language Models (LLMs). Dive into optimization techniques, including KV caching and Low Rank Adapters (LoRA), and gain hands-on experience with Predibase’s LoRAX framework inference server.
PyTorch Mobile: iOS examples
YOLOV8 - Object detection
A compilation of various ML and DL models and ways to optimize the their inferences.
A simple tool that applies structure-level optimizations (e.g. Quantization) to a TensorFlow model
Improving Natural Language Processing tasks using BERT-based models
PyTorch Mobile: Android examples of usage in applications
ncnn is a high-performance neural network inference framework optimized for the mobile platform