There are 3 repositories under triton-inference-server topic.
Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.
Add bisenetv2. My implementation of BiSeNet
This repository deploys YOLOv4 as an optimized TensorRT engine to Triton Inference Server
ClearML - Model-Serving Orchestration and Repository Solution
Deploy stable diffusion model with onnx/tenorrt + tritonserver
The Triton backend for the ONNX Runtime.
Hardware-accelerated DNN model inference ROS 2 packages using NVIDIA Triton/TensorRT for both Jetson and x86_64 with CUDA-capable GPU
Deploy DL/ ML inference pipelines with minimal extra code.
OpenAI compatible API for TensorRT LLM triton backend
Set up CI in DL/ cuda/ cudnn/ TensorRT/ onnx2trt/ onnxruntime/ onnxsim/ Pytorch/ Triton-Inference-Server/ Bazel/ Tesseract/ PaddleOCR/ NVIDIA-docker/ minIO/ Supervisord on AGX or PC from scratch.
Compare multiple optimization methods on triton to imporve model service performance
Build Recommender System with PyTorch + Redis + Elasticsearch + Feast + Triton + Flask. Vector Recall, DeepFM Ranking and Web Application.
Advanced inference pipeline using NVIDIA Triton Inference Server for CRAFT Text detection (Pytorch), included converter from Pytorch -> ONNX -> TensorRT, Inference pipelines (TensorRT, Triton server - multi-format). Supported model format for Triton inference: TensorRT engine, Torchscript, ONNX
Diffusion Model for Voice Conversion
Provides an ensemble model to deploy a YoloV8 ONNX model to Triton
Serving Example of CodeGen-350M-Mono-GPTJ on Triton Inference Server with Docker and Kubernetes
Magface Triton Inferece Server Using Tensorrt
FastAPI middleware for comparing different ML model serving approaches
Deploy KoGPT with Triton Inference Server
A demo of Redis Enterprise as the Online Feature Store deployed on GCP with Feast and NVIDIA Triton Inference Server.
This repository is an AI bootcamp material that consist of a workflow for computer vision
C++ application to perform computer vision tasks using Nvidia Triton Server for model inference
Python wrapper class for OpenVINO Model Server. User can submit inference request to OVMS with just a few lines of code.
Triton face detection & recognition
TensorFlow Lite backend with ArmNN delegate support for Nvidia Triton
Triton Inference Server Web UI
Triton-Pytorch Custom operator tutorial
MLModelService wrapping Nvidia's Triton Server
Go gRPC client for YOLO-NAS, YOLOv8 inference using the Triton Inference Server.
Example of deployment Pytorch model into the Triton inference server via MLFlow model registry
Triton backend that enables pre-processing, post-processing and other logic to be implemented in Python. In the repository, I use tech stack including YOLOv8, ONNX, EasyOCR, Triton Inference Server, CV2, Minio, Docker, and K8S. All of which we deploy on k80 and use CUDA 11.4
Miscellaneous codes and writings for MLOps