inference-server

There are 2 repositories under inference-server topic.

containers / ramalama
RamaLama is an open-source developer tool that simplifies the local serving of AI models from any source and facilitates their use for inference in production, all through the familiar language of containers.
ai containers cuda hip inference-server intel llamacpp llm podman vllm
Language:Python 2145
inference
roboflow / inference
Turn any computer or edge device into a command center for your computer vision projects.
computer-vision inference-api inference-server vit yolov5 yolov8 jetson tensorrt classification instance-segmentation object-detection onnx deployment docker inference machine-learning python yolo11 agents yolov12
Language:Python 1923
truss
basetenlabs / truss
The simplest way to serve AI/ML models in production
machine-learning artificial-intelligence easy-to-use inference-api inference-server model-serving open-source packaging falcon stable-diffusion whisper wizardlm
Language:Python 974
pipeless-ai / pipeless
An open-source computer vision framework to build and deploy apps in minutes
artificial-intelligence computer-vision multimedia multimedia-applications cloud deep-learning machine-learning object-detection video yolo ffmpeg gstreamer inference-server python vision-framework inference perception pipeline-framework stream-processing video-processing
Language:Rust 764
underneathall / pinferencia
Python + Inference - Model Deployment library in Python. Simplest model inference server ever.
ai inference-server predict inference deep-learning modelserver machine-learning python serving model-deployment huggingface pytorch tensorflow transformers artificial-intelligence data-science model-serving computer-vision nlp paddlepaddle
Language:Python 552
NVIDIA / gpu-rest-engine
A REST API for Caffe using Docker and Go
caffe gpu inference inference-server docker deep-learning
Language:C++ 419
BMW-InnovationLab / BMW-YOLOv4-Inference-API-GPU
This is a repository for an nocode object detection inference API using the Yolov3 and Yolov4 Darknet framework.
yolov3 inference gpu api deep-learning computer-vision detection-inference-api bounding-boxes inference-server docker rest-api deeplearning yolo alexeyab-darknet yolo-gui neural-network dockerfile inference-gui yolov4 no-code
Language:Python 280
BMW-InnovationLab / BMW-YOLOv4-Inference-API-CPU
This is a repository for an nocode object detection inference API using the Yolov4 and Yolov3 Opencv.
yolov3 inference api cpu deep-learning computer-vision opencv object-detection cpu-inference-api docker detection-inference-api deep-neural-networks neural-network rest-api inference-server inference-gui bounding-boxes yolov4 yolov4-darknet no-code
Language:Python 220
containers / podman-desktop-extension-ai-lab
Work with LLMs on a local environment using containers
ai containers inference-server llms local podman
Language:TypeScript 213
BMW-InnovationLab / BMW-TensorFlow-Inference-API-CPU
This is a repository for an object detection inference API using the Tensorflow framework.
tensorflow inference api cpu deep-learning object-detection computer-vision detection-inference-api docker tensorflow-framework predictions bounding-boxes docker-image docker-container deeplearning computervision docker-ce inference-engine inference-server rest-api
Language:Python 183
autodeployai / ai-serving
Serving AI/ML models in the open standard formats PMML and ONNX with both HTTP (REST API) and gRPC endpoints
pmml onnx inference-server onnx-models ai-serving pmml-model inference onnx-inference onnx-rest pmml-deployment onnx-grpc onnx-realtime pmml-grpc pmml-inference pmml-realtime pmml-rest
Language:Scala 157
kibae / onnxruntime-server
ONNX Runtime Server: The ONNX Runtime Server is a server that provides TCP and HTTP/HTTPS REST APIs for ONNX inference.
ai contributions-welcome cuda deep-learning inference-server machine-learning nueral-networks onnx onnxruntime
Language:C++ 154
orkhon
vertexclique / orkhon
Orkhon: ML Inference Framework and Server Runtime
inference-server machine-learning python3 tensorflow async multiprocessing data-parallelism
Language:Rust 149
kf5i / k3ai
K3ai is a lightweight, fully automated, AI infrastructure-in-a-box solution that allows anyone to experiment quickly with Kubeflow pipelines. K3ai is perfect for anything from Edge to laptops.
artificial-intelligence datascience edge inference-server k3s kubeflow kubeflow-pipelines kubernetes machinelearning
Language:PowerShell 101
notAI-tech / fastDeploy
Deploy DL/ ML inference pipelines with minimal extra code.
deep-learning tensorflow-serving tf-serving pytorch serving falcon gevent docker model-deployment model-serving http-server gunicorn torchserve triton-inference-server python triton triton-server inference-server streaming-audio websocket
Language:Python 99
RubixML / Server
A standalone inference server for trained Rubix ML estimators.
machine-learning http-server model-server infrastructure api model-deployment microservice json-api php rest-api rubix-ml inference inference-engine php-ml ml-infrastructure php-machine-learning inference-server rubix-server
Language:PHP 62
friendliai / friendli-client
Friendli: the fastest serving engine for generative AI
generative-ai llm llm-inference llmops serving gpt gpt3 inference llama2 llm-serving llms inference-engine inference-server ai llm-ops mistral ml mlops stable-diffusion
Language:Python 43
wingman
curtisgray / wingman
Wingman is the fastest and easiest way to run Llama models on your PC or Mac.
ai chatbot chatgpt linux llama llamacpp llm local macos windows download downloader openai gpu gpu-acceleration gpu-monitoring inference inference-engine inference-server
Language:TypeScript 42
k9ele7en / Triton-TensorRT-Inference-CRAFT-pytorch
Advanced inference pipeline using NVIDIA Triton Inference Server for CRAFT Text detection (Pytorch), included converter from Pytorch -> ONNX -> TensorRT, Inference pipelines (TensorRT, Triton server - multi-format). Supported model format for Triton inference: TensorRT engine, Torchscript, ONNX
triton-inference-server tensorrt tensorrt-conversion onnx onnx-torch pytorch nvidia-docker inference-engine inference-server inference text-detection text-detection-from-image
Language:Python 33
haicheviet / fullstack-machine-learning-inference
Fullstack machine learning inference template
aws cloudformation fastapi full-stack inference-server infrastructure-as-code machine-learning machine-learning-template twitter-sentiment-analysis machine-learning-infrastructure
Language:Jupyter Notebook 30
tensorchord / inference-benchmark
Benchmark for machine learning model online serving (LLM, embedding, Stable-Diffusion, Whisper)
benchmark inference-server llm stable-diffusion whisper
Language:Python 28
leimao / Simple-Inference-Server
Inference Server Implementation from Scratch for Machine Learning Models
inference-server
Language:Python 23
roboflow / inference-dashboard-example
Roboflow's inference server to analyze video streams. This project extracts insights from video frames at defined intervals and generates informative visualizations and CSV outputs.
inference inference-server object-detection predictions
Language:Python 14
csy1204 / TripBigs_Web
Session Based Real-time Hotel Recommendation Web Application
inference-server redux react
Language:Python 10
woodx9 / tllm
create your own llm inference server from scratch
inference-server llm llm-inference llm-training llmops
Language:Python 10
pandruszkow / whisper-inference-server
A networked inference server for Whisper so you don't have to keep waiting for the audio model to reload for the x-hunderdth time.
flask inference-api inference-server python3 whisper-ai
Language:Python 8
redis-applied-ai / loan-prediction-microservice
An example of using Redis + RedisAI for a microservice that predicts consumer loan probabilities using Redis as a feature and model store and RedisAI as an inference server.
redisai feature-store inference-server redis fastapi xgboost loan-prediction onnx onnxruntime
Language:Jupyter Notebook 7
tensorchord / modelz-docs
Modelz is a developer-first platform for prototyping and deploying machine learning models.
generative-ai inference inference-server llm machine-learning modelz serverless
Language:MDX 7
geniusrise / vision
Vision and vision-multi-modal components for geniusrise framework
huggingface inference inference-server mlops multimodal vision
Language:Python 6
dlzou / computron
Serving distributed deep learning models with model parallel swapping.
deep-learning inference-server model-parallelism
Language:Jupyter Notebook 5
geniusrise / text
Text components powering LLMs & SLMs for geniusrise framework
ai huggingface inference-api llm inference inference-server
Language:Python 5
SABER-labs / torch_batcher
Serve pytorch inference requests using batching with redis for faster performance.
pytorch inference-server batch-inference gpu redis
Language:Python 5
StefanoLusardi / tiny_inference_engine
Client/Server system to perform distributed inference on high load systems.
inference-engine ai deep-neural-networks cpp cmake conan onnxruntime grpc docker inference-server kserve inference-client
Language:C++ 5
goamegah / how-to-serve-models
Different ways of implementing an API to serve an image classification model
api backend endpoints front-end inference-api inference-server python restful-api
3
geniusrise / audio
Audio components for geniusrise framework
ai audio huggingface speech-recognition speech-to-text inference inference-server
Language:Python 2
xdevfaheem / TGS
Effortlessly Deploy and Serve Large Language Models in the Cloud as an API Endpoint for Inference
inference-server llm-inference llmops
Language:Python 2

inference-server

containers / ramalama

roboflow / inference

basetenlabs / truss

pipeless-ai / pipeless

underneathall / pinferencia

NVIDIA / gpu-rest-engine

BMW-InnovationLab / BMW-YOLOv4-Inference-API-GPU

BMW-InnovationLab / BMW-YOLOv4-Inference-API-CPU

containers / podman-desktop-extension-ai-lab

BMW-InnovationLab / BMW-TensorFlow-Inference-API-CPU

autodeployai / ai-serving

kibae / onnxruntime-server

vertexclique / orkhon

kf5i / k3ai

notAI-tech / fastDeploy

RubixML / Server

friendliai / friendli-client

curtisgray / wingman

k9ele7en / Triton-TensorRT-Inference-CRAFT-pytorch

haicheviet / fullstack-machine-learning-inference

tensorchord / inference-benchmark

leimao / Simple-Inference-Server

roboflow / inference-dashboard-example

csy1204 / TripBigs_Web

woodx9 / tllm

pandruszkow / whisper-inference-server

redis-applied-ai / loan-prediction-microservice

tensorchord / modelz-docs

geniusrise / vision

dlzou / computron

geniusrise / text

SABER-labs / torch_batcher

StefanoLusardi / tiny_inference_engine

goamegah / how-to-serve-models

geniusrise / audio

xdevfaheem / TGS