There are 4 repositories under inference-optimization topic.
High-efficiency floating-point neural network inference operators for mobile, server, and Web
BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.
Batch normalization fusion for PyTorch
[MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration
Optimize layers structure of Keras model to reduce computation time
A set of tool which would make your life easier with Tensorrt and Onnxruntime. This Repo is designed for YoloV3
[WIP] A template for getting started writing code using GGML
A constrained expectation-maximization algorithm for feasible graph inference.
Faster inference YOLOv8: Optimize and export YOLOv8 models for faster inference using OpenVINO and Numpy 🔢
Learn the ins and outs of efficiently serving Large Language Models (LLMs). Dive into optimization techniques, including KV caching and Low Rank Adapters (LoRA), and gain hands-on experience with Predibase’s LoRAX framework inference server.
Batch Partitioning for Multi-PE Inference with TVM (2020)
🤖️ Optimized CUDA Kernels for Fast MobileNetV2 Inference
MLP-Rank: A graph theoretical approach to structured pruning of deep neural networks based on weighted Page Rank centrality as introduced by the related thesis.
MIVisionX Python Inference Analyzer uses pre-trained ONNX/NNEF/Caffe models to analyze inference results and summarize individual image results
PyTorch Mobile: Android examples of usage in applications
PyTorch Mobile: iOS examples
Interface for TensorRT engines inference along with an example of YOLOv4 engine being used.
YOLOV8 - Object detection
A compilation of various ML and DL models and ways to optimize the their inferences.
A simple tool that applies structure-level optimizations (e.g. Quantization) to a TensorFlow model
OnnxRT based Inference Optimization of Roberta model trained for Sentiment Analysis On Twitter Dataset
Improving Natural Language Processing tasks using BERT-based models
ncnn is a high-performance neural network inference framework optimized for the mobile platform