Baseten

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Language:C++Apache-2.0000

tensorrtllm_backend

The Triton TensorRT-LLM Backend

Apache-2.0000

triton-inference-server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

Language:PythonBSD-3-Clause010

truss-public-gh-repo-test

A public github repo for testing truss deploy flow

Language:Python000

vicunlocked-alpaca-30b

Language:Python000

wizardlm-truss-1

000

Baseten

basetenlabs

Baseten's repositories

truss

truss-examples

stablelm-truss

starcoder-truss

wizardlm-truss

demos

falcon-7b-truss

infrastructure-take-home

backend-take-home

ControlNet

frontend-log-viewer-challenge

pygmalion-6b-truss

.github

chainlit-cookbook

diffusers

fill_mask

gpu-operator

image_segmentation

kaniko

langchain

mpt-7b-base-truss

python_backend

question_answering

stable-diffusion-testing

TensorRT-LLM

tensorrtllm_backend

triton-inference-server

truss-public-gh-repo-test

vicunlocked-alpaca-30b

wizardlm-truss-1