There are 2 repositories under inference-server topic.
The simplest way to serve AI/ML models in production
An open-source computer vision framework to build and deploy apps in minutes
Python + Inference - Model Deployment library in Python. Simplest model inference server ever.
A REST API for Caffe using Docker and Go
This is a repository for an nocode object detection inference API using the Yolov3 and Yolov4 Darknet framework.
This is a repository for an nocode object detection inference API using the Yolov4 and Yolov3 Opencv.
This is a repository for an object detection inference API using the Tensorflow framework.
Orkhon: ML Inference Framework and Server Runtime
Serving AI/ML models in the open standard formats PMML and ONNX with both HTTP (REST API) and gRPC endpoints
ONNX Runtime Server: The ONNX Runtime Server is a server that provides TCP and HTTP/HTTPS REST APIs for ONNX inference.
Deploy DL/ ML inference pipelines with minimal extra code.
Wingman is the fastest and easiest way to run Llama models on your PC or Mac.
Friendli: the fastest serving engine for generative AI
Advanced inference pipeline using NVIDIA Triton Inference Server for CRAFT Text detection (Pytorch), included converter from Pytorch -> ONNX -> TensorRT, Inference pipelines (TensorRT, Triton server - multi-format). Supported model format for Triton inference: TensorRT engine, Torchscript, ONNX
Fullstack machine learning inference template
Benchmark for machine learning model online serving (LLM, embedding, Stable-Diffusion, Whisper)
Inference Server Implementation from Scratch for Machine Learning Models
Session Based Real-time Hotel Recommendation Web Application
A networked inference server for Whisper so you don't have to keep waiting for the audio model to reload for the x-hunderdth time.
Modelz is a developer-first platform for prototyping and deploying machine learning models.
An example of using Redis + RedisAI for a microservice that predicts consumer loan probabilities using Redis as a feature and model store and RedisAI as an inference server.
Roboflow's inference server to analyze video streams. This project extracts insights from video frames at defined intervals and generates informative visualizations and CSV outputs.
Serve pytorch inference requests using batching with redis for faster performance.
Client/Server system to perform distributed inference on high load systems.
Vision and vision-multi-modal components for geniusrise framework
Text components powering LLMs & SLMs for geniusrise framework
Audio components for geniusrise framework
Run your own production inference code with Sagemaker
An AI-powered mobile crop advisory app for farmers, gardeners that can provide information about crops using an image taken by the user. This supports 10 crops and 37 kinds of crop diseases. The AI model is a ResNet network that has been fine-tuned using crop images that were collected by web-scraping from Google Images and Plant-Village Dataset.