There are 12 repositories under serving topic.
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
A flexible, high-performance serving system for machine learning models
A Cloud Native Batch System (Project under CNCF)
An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models
In this repository, I will share some useful notes and references about deploying deep learning-based models in production.
Build custom inference engines for models, agents, multi-modal systems, RAG, pipelines and more.
High-performance Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle
A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.
Database system for AI-powered apps
TensorFlow template application for deep learning
A comprehensive guide to building RAG-based LLM applications for production.
A multi-modal vector database that supports upserts and vector queries using unified SQL (MySQL-Compatible) on structured and unstructured data, while meeting the requirements of high concurrency and ultra-low latency.
DELTA is a deep learning based natural language and speech processing platform. LF AI & DATA Projects: https://lfaidata.foundation/projects/delta/
A flexible, high-performance carrier for machine learning models(『飞桨』服务化部署框架)
A scalable inference server for models optimized with OpenVINO™
Generic and easy-to-use serving service for machine learning models
Python + Inference - Model Deployment library in Python. Simplest model inference server ever.
A high-performance inference system for large language models, designed for production environments.
ML pipeline orchestration and model deployments on Kubernetes.
MLOps Platform
A universal scalable machine learning model deployment solution
Blockchain Search with GraphQL APIs
MLModelCI is a complete MLOps platform for managing, converting, profiling, and deploying MLaaS (Machine Learning-as-a-Service), bridging the gap between current ML training and serving systems.
Deep Learning Deployment Framework: Supports tf/torch/trt/trtllm/vllm and other NN frameworks. Support dynamic batching, and streaming modes. It is dual-language compatible with Python and C++, offering scalability, extensibility, and high performance. It helps users quickly deploy models and provide services through HTTP/RPC interfaces.
ClearML - Model-Serving Orchestration and Repository Solution
bring keras-models to production with tensorflow-serving and nodejs + docker :pizza:
TensorFlow Serving ARM - A project for cross-compiling TensorFlow Serving targeting popular ARM cores
Deploy DL/ ML inference pipelines with minimal extra code.
Recipes for reproducing training and serving benchmarks for large machine learning models using GPUs on Google Cloud.
A collection of model deployment library and technique.