Beast code in Giters

NVIDIA® TensorRT™, an SDK for high-performance deep learning inference, includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for inference applications.

Language:C++Apache-2.0000

triton

Development repository for the Triton language and compiler

Language:C++MIT000

Qwen-7B

The official repo of Qwen-7B (通义千问-7B) chat & pretrained large language model proposed by Alibaba Cloud.

NOASSERTION000

text-generation-inference

Large Language Model Text Generation Inference

NOASSERTION000

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonApache-2.0000

volcano

A Cloud Native Batch System (Project under CNCF)

Apache-2.0000

fredbjer

fredchen's repositories

CC_Cat

clip-as-service

DeepSpeedExamples

FasterTransformer

juicefs

k8s-client-go

k8s-examples

lsp-kubeutil

Megatron-DeepSpeed

nanoGPT

pytorch

stable-diffusion-webui

stable-diffusion-webui-docker

TensorRT

triton

Qwen-7B

text-generation-inference

vllm

volcano