There are 0 repository under fastertransformer topic.
Serving Example of CodeGen-350M-Mono-GPTJ on Triton Inference Server with Docker and Kubernetes
Deploy KoGPT with Triton Inference Server
tutorial on how to deploy a scalable autoregressive causal language model transformer using nvidia triton server
This repository is a code sample to serve Large Language Models (LLM) on a Google Kubernetes Engine (GKE) cluster with GPUs running NVIDIA Triton Inference Server with FasterTransformer backend.