There are 2 repositories under inference-server topic.
RamaLama is an open-source developer tool that simplifies the local serving of AI models from any source and facilitates their use for inference in production, all through the familiar language of containers.
Turn any computer or edge device into a command center for your computer vision projects.
The simplest way to serve AI/ML models in production
An open-source computer vision framework to build and deploy apps in minutes
Python + Inference - Model Deployment library in Python. Simplest model inference server ever.
A REST API for Caffe using Docker and Go
This is a repository for an nocode object detection inference API using the Yolov3 and Yolov4 Darknet framework.
This is a repository for an nocode object detection inference API using the Yolov4 and Yolov3 Opencv.
Work with LLMs on a local environment using containers
This is a repository for an object detection inference API using the Tensorflow framework.
Serving AI/ML models in the open standard formats PMML and ONNX with both HTTP (REST API) and gRPC endpoints
ONNX Runtime Server: The ONNX Runtime Server is a server that provides TCP and HTTP/HTTPS REST APIs for ONNX inference.
Orkhon: ML Inference Framework and Server Runtime
K3ai is a lightweight, fully automated, AI infrastructure-in-a-box solution that allows anyone to experiment quickly with Kubeflow pipelines. K3ai is perfect for anything from Edge to laptops.
Deploy DL/ ML inference pipelines with minimal extra code.
Friendli: the fastest serving engine for generative AI
Wingman is the fastest and easiest way to run Llama models on your PC or Mac.
Advanced inference pipeline using NVIDIA Triton Inference Server for CRAFT Text detection (Pytorch), included converter from Pytorch -> ONNX -> TensorRT, Inference pipelines (TensorRT, Triton server - multi-format). Supported model format for Triton inference: TensorRT engine, Torchscript, ONNX
Fullstack machine learning inference template
Benchmark for machine learning model online serving (LLM, embedding, Stable-Diffusion, Whisper)
Inference Server Implementation from Scratch for Machine Learning Models
Roboflow's inference server to analyze video streams. This project extracts insights from video frames at defined intervals and generates informative visualizations and CSV outputs.
Session Based Real-time Hotel Recommendation Web Application
A networked inference server for Whisper so you don't have to keep waiting for the audio model to reload for the x-hunderdth time.
An example of using Redis + RedisAI for a microservice that predicts consumer loan probabilities using Redis as a feature and model store and RedisAI as an inference server.
Modelz is a developer-first platform for prototyping and deploying machine learning models.
Vision and vision-multi-modal components for geniusrise framework
Text components powering LLMs & SLMs for geniusrise framework
Serve pytorch inference requests using batching with redis for faster performance.
Client/Server system to perform distributed inference on high load systems.
Different ways of implementing an API to serve an image classification model
Audio components for geniusrise framework
Effortlessly Deploy and Serve Large Language Models in the Cloud as an API Endpoint for Inference