serving

There are 12 repositories under serving topic.

ray-project / ray
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
data-science deep-learning deployment distributed hyperparameter-optimization hyperparameter-search large-language-models llm llm-inference llm-serving machine-learning optimization parallel python pytorch ray reinforcement-learning rllib serving tensorflow
Language:Python 39710
tensorflow / serving
A flexible, high-performance serving system for machine learning models
cpp deep-learning deep-neural-networks machine-learning ml neural-network python serving tensorflow
Language:C++ 6329
volcano-sh / volcano
A Cloud Native Batch System (Project under CNCF)
ai batch-systems bigdata gene golang hpc kubernetes machine-learning serving training
Language:Go 5053
SeldonIO / seldon-core
An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models
kubernetes machine-learning deployment serving mlops aiops machine-learning-operations production-machine-learning
Language:Go 4670
ahkarami / Deep-Learning-in-Production
In this repository, I will share some useful notes and references about deploying deep learning-based models in production.
deep-learning deep-neural-networks python pytorch tesnorflow keras mxnet caffe2 production serving c-plus-plus model-serving tutorial flask rest-api react serving-pytorch-models convert-pytorch-models angularjs tensorflow-models
4377
pytorch / serve
Serve, optimize and scale PyTorch models in production
pytorch machine-learning mlops serving docker kubernetes optimization cpu gpu metrics deep-learning
Language:Java 4351
Lightning-AI / LitServe
Build custom inference engines for models, agents, multi-modal systems, RAG, pipelines and more.
ai api artificial-intelligence deep-learning developer-tools fastapi rest-api serving web
Language:Python 3684
PaddlePaddle / FastDeploy
High-performance Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle
ernie ernie-45 ernie-45-vl inference llm llm-serving openai serving vllm
Language:Python 3554
skyzh / tiny-llm
A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.
course large-language-model llm python qwen qwen2 serving vllm
Language:Python 3389
evadb
georgia-tech-db / evadb
Database system for AI-powered apps
eva video-analytics serving database labeling object-detection data-analysis ai chatgpt langchain auto-gpt gpt4all huggingface llm gpt-4 agent hacktoberfest
Language:Python 2685
tobegit3hub / tensorflow_template_application
TensorFlow template application for deep learning
tensorflow tfrecords libsvm csv deep-learning machine-learning mlp cnn lstm inference tensorboard wide-and-deep serving
Language:Python 1882
llm-applications
ray-project / llm-applications
A comprehensive guide to building RAG-based LLM applications for production.
anyscale fine-tuning llama2 llms machine-learning openai ray serving
Language:Jupyter Notebook 1838
dingodb / dingo
A multi-modal vector database that supports upserts and vector queries using unified SQL (MySQL-Compatible) on structured and unstructured data, while meeting the requirements of high concurrency and ultra-low latency.
serving embedding-store vector-database mysql-compatibility embedding-search key-value-distributed-store vector-ocean unified-sql structured-data unstructured-data real-time-semantic-search hybrid-search
Language:Java 1684
Delta-ML / delta
DELTA is a deep learning based natural language and speech processing platform. LF AI & DATA Projects: https://lfaidata.foundation/projects/delta/
nlp deep-learning tensorflow speech sequence-to-sequence seq2seq speech-recognition text-classification speaker-verification nlu text-generation emotion-recognition tensorflow-serving tensorflow-lite inference asr serving front-end custom-ops ops
Language:Python 1597
PaddlePaddle / Serving
A flexible, high-performance carrier for machine learning models（『飞桨』服务化部署框架）
paddle-serving rpc-service gpu python docker serving pipeline paddle deep-learning online-service prediction predictor dag micro-service microservice-toolkit
Language:C++ 919
openvinotoolkit / model_server
A scalable inference server for models optimized with OpenVINO™
openvino inference ai edge cloud deep-learning serving dag kubernetes machine-learning model-serving genai
Language:C++ 787
tobegit3hub / simple_tensorflow_serving
Generic and easy-to-use serving service for machine learning models
tensorflow-models savedmodel tensorflow serving client http machine-learning deep-learning
Language:JavaScript 761
underneathall / pinferencia
Python + Inference - Model Deployment library in Python. Simplest model inference server ever.
ai inference-server predict inference deep-learning modelserver machine-learning python serving model-deployment huggingface pytorch tensorflow transformers artificial-intelligence data-science model-serving computer-vision nlp paddlepaddle
Language:Python 549
meta-soul / MetaSpore
A unified end-to-end machine intelligence platform
abtesting ai deeplearning machinelearning serving training
Language:Python 537
vectorch-ai / ScaleLLM
A high-performance inference system for large language models, designed for production environments.
cuda inference llm llm-inference model production serving speculative transformer efficiency performance gpu llama3 llama
Language:C++ 480
polyaxon / haupt
Lineage metadata API, artifacts streams, sandbox, API, and spaces for Polyaxon
tensorflow deep-learning jupyter python pytorch machine-learning models ui visualization matplotlib plotly bokeh mlops data-science data-visualization data-processing data-profiling tracking lineage serving
Language:Python 452
zzsza / Boostcamp-AI-Tech-Product-Serving
부스트캠프 AI Tech - Product Serving 자료
mlops serving
Language:Python 452
bodywork-core
bodywork-ml / bodywork-core
ML pipeline orchestration and model deployments on Kubernetes.
batch cicd continuous-deployment data-science devops framework kubernetes machine-learning mlops orchestration pipeline python serving
Language:Python 433
Hydrospheredata / hydro-serving
MLOps Platform
serving scoring spark tensorflow scikit-learn serverless pipelines realtime machine-learning models
Language:Mustache 272
deepjavalibrary / djl-serving
A universal scalable machine learning model deployment solution
deep-learning deployment djl inference pytorch serving
Language:Java 238
FasterDecoding / BitDelta
llm quantization serving
Language:Jupyter Notebook 202
outcaste-io / outserv
Blockchain Search with GraphQL APIs
graphql serving system decentralized web3 search search-engine
Language:Go 198
cap-ntu / ML-Model-CI
MLModelCI is a complete MLOps platform for managing, converting, profiling, and deploying MLaaS (Machine Learning-as-a-Service), bridging the gap between current ML training and serving systems.
serving deep-learning continuous-integration inference convert-models dispatcher profiler tensorflow-serving onnx pytorch tensorrt tensorrt-inference-server mlops
Language:Python 194
NetEase-Media / grps
Deep Learning Deployment Framework: Supports tf/torch/trt/trtllm/vllm and other NN frameworks. Support dynamic batching, and streaming modes. It is dual-language compatible with Python and C++, offering scalability, extensibility, and high performance. It helps users quickly deploy models and provide services through HTTP/RPC interfaces.
dynamic-batching serving tensorflow tensorrt tensorrt-llm torch triton-inference-server vllm
Language:C++ 166
torchpipe / torchpipe
Serving Inside Pytorch
deployment inference pipeline-parallelism serving tensorrt triton-inference-server ray pytorch torch2trt serve llm-serving
Language:C++ 164
clearml / clearml-serving
ClearML - Model-Serving Orchestration and Repository Solution
machine-learning mlops devops deep-learning kubernetes ai clearml model-serving serving serving-pytorch-models serving-ml tensorflow-serving triton triton-inference-server
Language:Python 157
krystianity / keras-serving
bring keras-models to production with tensorflow-serving and nodejs + docker :pizza:
cpp docker grpc keras network neuronal nodejs production python serving tensorflow
Language:Python 152
emacski / tensorflow-serving-arm
TensorFlow Serving ARM - A project for cross-compiling TensorFlow Serving targeting popular ARM cores
tensorflow serving arm aarch64 docker cross-compile armhf armv7 arm64 armv8
Language:C++ 102
notAI-tech / fastDeploy
Deploy DL/ ML inference pipelines with minimal extra code.
deep-learning tensorflow-serving tf-serving pytorch serving falcon gevent docker model-deployment model-serving http-server gunicorn torchserve triton-inference-server python triton triton-server inference-server streaming-audio websocket
Language:Python 100
AI-Hypercomputer / gpu-recipes
Recipes for reproducing training and serving benchmarks for large machine learning models using GPUs on Google Cloud.
benchmarks distributed-training google-cloud-platform gpu serving
Language:Python 85
model_deployment
balavenkatesh3322 / model_deployment
A collection of model deployment library and technique.
model model-deployment model-serving model-server tensorflow keras pytorch aws azure serving serving-pytorch-models serving-recommendation serving-tensors caffe mxnet machine-learning deep-learning neural-network data-science
72