mani-kantap / llm-inference-solutions

A collection of all available inference solutions for the LLMs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

llm-inference-solutions

A collection of all available inference solutions for the LLMs

Name Org Description
vllm UC Berkeley A high-throughput and memory-efficient inference and serving engine for LLMs
Text-Generation-Inference Hugginface🤗 Large Language Model Text Generation Inference
llm-engine ScaleAI Scale LLM Engine public repository
DeepSpeed Microsoft DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective
OpenLLM BentoML Operating LLMs in production
LLMDeploy InternLM Team LMDeploy is a toolkit for compressing, deploying, and serving LLM
FlexFlow CMU,Stanford,UCSD A distributed deep learning framework.
CTranslate2 OpenNMT Fast inference engine for Transformer models
Fastchat lm-sys An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Triton-Inference-Server Nvidia The Triton Inference Server provides an optimized cloud and edge inferencing solution.
Lepton.AI lepton.ai A Pythonic framework to simplify AI service building
ScaleLLM Vectorch A high-performance inference system for large language models, designed for production environments
Lorax Predibase Serve 100s of Fine-Tuned LLMs in Production for the Cost of 1
TensorRT-LLM Nvidia TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines
mistral.rs mistral.rs Blazingly fast LLM inference.

About

A collection of all available inference solutions for the LLMs

License:MIT License