There are 11 repositories under serving topic.
Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
A flexible, high-performance serving system for machine learning models
AI + Data, online. https://vespa.ai
An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models
In this repository, I will share some useful notes and references about deploying deep learning-based models in production.
⚡️An Easy-to-use and Fast Deep Learning Model Deployment Toolkit for ☁️Cloud 📱Mobile and 📹Edge. Including Image, Video, Text and Audio 20+ main stream scenarios and 150+ SOTA models with end-to-end optimization, multi-platform and multi-framework support.
Database system for AI-powered apps
Lightning-fast serving engine for AI models. Flexible. Easy. Enterprise-scale.
TensorFlow template application for deep learning
A comprehensive guide to building RAG-based LLM applications for production.
RayLLM - LLMs on Ray
A flexible, high-performance carrier for machine learning models(『飞桨』服务化部署框架)
Generic and easy-to-use serving service for machine learning models
A scalable inference server for models optimized with OpenVINO™
Python + Inference - Model Deployment library in Python. Simplest model inference server ever.
ML pipeline orchestration and model deployments on Kubernetes.
A high-performance inference system for large language models, designed for production environments.
MLOps Platform
Blockchain Search with GraphQL APIs
A universal scalable machine learning model deployment solution
MLModelCI is a complete MLOps platform for managing, converting, profiling, and deploying MLaaS (Machine Learning-as-a-Service), bridging the gap between current ML training and serving systems.
bring keras-models to production with tensorflow-serving and nodejs + docker :pizza:
【深度学习模型部署框架】支持tf/torch/trt/trtllm/vllm以及更多nn框架,支持dynamic batching、streaming模式,支持python/c++双语言,可限制,可拓展,高性能。帮助用户快速地将模型部署到线上,并通过http/rpc接口方式提供服务。
ClearML - Model-Serving Orchestration and Repository Solution
TensorFlow Serving ARM - A project for cross-compiling TensorFlow Serving targeting popular ARM cores
Deploy DL/ ML inference pipelines with minimal extra code.
A collection of model deployment library and technique.