There are 3 repositories under triton-inference-server topic.
Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.
Add bisenetv2. My implementation of BiSeNet
This repository deploys YOLOv4 as an optimized TensorRT engine to Triton Inference Server
OpenAI compatible API for TensorRT LLM triton backend
Deep Learning Deployment Framework: Supports tf/torch/trt/trtllm/vllm and other NN frameworks. Support dynamic batching, and streaming modes. It is dual-language compatible with Python and C++, offering scalability, extensibility, and high performance. It helps users quickly deploy models and provide services through HTTP/RPC interfaces.
The Triton backend for the ONNX Runtime.
ClearML - Model-Serving Orchestration and Repository Solution
Deploy stable diffusion model with onnx/tenorrt + tritonserver
NVIDIA-accelerated DNN model inference ROS 2 packages using NVIDIA Triton/TensorRT for both Jetson and x86_64 with CUDA-capable GPU
Анализ трафика на круговом движении с использованием компьютерного зрения
Deploy DL/ ML inference pipelines with minimal extra code.
Diffusion Model for Voice Conversion
Build Recommender System with PyTorch + Redis + Elasticsearch + Feast + Triton + Flask. Vector Recall, DeepFM Ranking and Web Application.
Compare multiple optimization methods on triton to imporve model service performance
Set up CI in DL/ cuda/ cudnn/ TensorRT/ onnx2trt/ onnxruntime/ onnxsim/ Pytorch/ Triton-Inference-Server/ Bazel/ Tesseract/ PaddleOCR/ NVIDIA-docker/ minIO/ Supervisord on AGX or PC from scratch.
Provides an ensemble model to deploy a YoloV8 ONNX model to Triton
Advanced inference pipeline using NVIDIA Triton Inference Server for CRAFT Text detection (Pytorch), included converter from Pytorch -> ONNX -> TensorRT, Inference pipelines (TensorRT, Triton server - multi-format). Supported model format for Triton inference: TensorRT engine, Torchscript, ONNX
C++ application to perform computer vision tasks using Nvidia Triton Server for model inference
This repository is an AI bootcamp material that consist of a workflow for computer vision
Serving Example of CodeGen-350M-Mono-GPTJ on Triton Inference Server with Docker and Kubernetes
Generate Glue Code in seconds to simplify your Nvidia Triton Inference Server Deployments
Triton Inference Server + TensorRT + метрики
This project provides a pipeline for deploying and performing inference with the YOLOv8 object detection model using the Triton Inference Server. It supports integration with local systems, Docker-based setups, or Google Cloud’s Vertex AI. The repository includes scripts for automated deployment, benchmarks and GUI inference.
Magface Triton Inferece Server Using Tensorrt
FastAPI middleware for comparing different ML model serving approaches
Triton Inference Server Web UI
A demo of Redis Enterprise as the Online Feature Store deployed on GCP with Feast and NVIDIA Triton Inference Server.
Miscellaneous codes and writings for MLOps
Deploy KoGPT with Triton Inference Server
Provides an ensemble model to deploy a YOLOv8 TensorRT model to Triton
The Purpose of this repository is to create a DeepStream/Triton-Server sample application that utilizes yolov7, yolov7-qat, yolov9 models to perform inference on video files or RTSP streams.
Python wrapper class for OpenVINO Model Server. User can submit inference request to OVMS with just a few lines of code.