llm-serving

There are 49 repositories under llm-serving topic.

ray-project / ray
Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
ray distributed parallel machine-learning reinforcement-learning deep-learning python rllib hyperparameter-search optimization data-science automl hyperparameter-optimization model-selection java serving deployment pytorch tensorflow llm-serving
Language:Python 31386
vllm-project / vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
amd cuda gpt inference inferentia llama llm llm-serving llmops mlops model-serving pytorch rocm trainium transformer
Language:Python 19525
bentoml / OpenLLM
Run any open-source LLMs, such as Llama 2, Mistral, as OpenAI compatible API endpoint in the cloud.
llm llmops model-inference falcon fine-tuning stablelm llm-serving llama mpt vicuna bentoml llama2 llm-inference llm-ops open-source-llm openllm ai mistral ml mlops
Language:Python 8926
BentoML
bentoml / BentoML
The most flexible way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Inference Graph/Pipelines, Compound AI systems, Multi-Modal, RAG as a Service, and more!
model-serving mlops llmops generative-ai llm-inference model-inference-service inference-platform deep-learning llm-serving machine-learning python multimodal ml-engineering llm
Language:Python 6599
liguodongiot / llm-action
本项目旨在分享大模型相关技术原理以及实战经验。
llm llm-inference llm-serving llm-training llmops
Language:HTML 6557
skypilot-org / skypilot
SkyPilot: Run LLMs, AI, and Batch jobs on any cloud. Get maximum savings, highest GPU availability, and managed execution—all with a simple interface.
cloud-computing data-science deep-learning gpu hyperparameter-tuning machine-learning tpu job-queue job-scheduler cloud-management distributed-training ml-infrastructure multicloud spot-instances ml-platform cost-management cost-optimization finops llm-serving llm-training
Language:Python 5723
superduperdb
SuperDuperDB / superduperdb
🔮 SuperDuperDB: Bring AI to your database! Build, deploy and manage any AI application directly with your existing data infrastructure, without moving your data. Including streaming inference, scalable model training and vector search.
ai mlops torch transformers mongodb python pytorch ml database data inference distributed-ml llm-inference pretrained-models chatbot semantic-search llm-serving llmops vector-search rag
Language:Python 4406
microsoft / aici
AICI: Prompts as (Wasm) Programs
ai rust wasm wasmtime inference language-model llm llm-framework llm-inference llm-serving llmops model-serving transformer
Language:Rust 1758
predibase / lorax
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
fine-tuning gpt llama llm llm-inference llm-serving llmops lora model-serving pytorch transformers
Language:Python 1605
ray-project / ray-llm
RayLLM - LLMs on Ray
distributed-systems large-language-models ray serving transformers llm llm-inference llm-serving llmops
Language:Python 1173
mosec
mosecorg / mosec
A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine
model-serving deep-learning machine-learning nerual-network mlops machine-learning-platform hacktoberfest gpu python pytorch tensorflow llm jax llm-serving rust cv mxnet tts
Language:Python 712
hpcaitech / SwiftInfer
Efficient AI Inference & Serving
artificial-intelligence deep-learning gpt inference llama llama2 llm-inference llm-serving
Language:Python 436
alibaba / rtp-llm
RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
gpt inference llama llm llm-serving llmops model-serving
Language:C++ 383
rohan-paul / LLM-FineTuning-Large-Language-Models
LLM (Large Language Model) FineTuning
gpt-3 gpt3-turbo large-language-models llama2 llm llm-finetuning llm-inference llm-serving llm-training mistral-7b open-source-llm pytorch
Language:Jupyter Notebook 352
ray-project / ray-educational-materials
This is suite of the hands-on training materials that shows how to scale CV, NLP, time-series forecasting workloads with Ray.
deep-learning distributed-machine-learning generative-ai llm llm-inference llm-serving ray ray-data ray-distributed ray-serve ray-train ray-tune
Language:Jupyter Notebook 294
substratusai / runbooks
Finetune LLMs on K8s by using Runbooks
kubernetes mlops llmops kubernetes-operator llm-inference llm-serving llm-training ml-platform
Language:Go 157
chenhunghan / ialacol
🪶 Lightweight OpenAI drop-in replacement for Kubernetes
ai helm kubernetes langchain llm python openai cloudnative ggml gpu llamacpp cuda gptq llm-inference llm-serving
Language:Python 140
slai-labs / get-beam
Run GPU inference and training jobs on serverless infrastructure that scales with you.
artificial-intelligence machine-learning python serverless cloud-computing cost-optimization data-science deep-learning distributed-computing gpu-acceleration gpu-computing hpc llm-serving llm-training ml-infrastructure mlops serverless-architectures
Language:Shell 88
galeselee / Awesome_LLM_System-PaperList
Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on accelerating LLMs, currently focusing mainly on inference acceleration, and related works will be gradually added in the future. Welcome contributions!
llm-inference llm-serving paperlist papers system
64
HPMLL / BurstGPT
A GPT-3.5 & GPT-4 Workload Trace to Optimize LLM Serving Systems
dataset llm llm-serving mlsys
61
sugarcane-ai / sugarcane-ai
npm like package ecosystem for Prompts 🤖
ai llm prompt-engineering framework llm-chain llm-framework llm-serving llm-training llmops next packages performance prompts llm-finetuning
Language:TypeScript 46
mani-kantap / llm-inference-solutions
A collection of all available inference solutions for the LLMs
llm-inference llm-serving llmops
43
friendliai / friendli-client
Friendli: the fastest serving engine for generative AI
generative-ai llm llm-inference llmops serving gpt gpt3 inference llama2 llm-serving llms inference-engine inference-server ai llm-ops mistral ml mlops stable-diffusion
Language:Python 34
asprenger / ray_vllm_inference
A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.
inference llm llm-serving llmops mlops model-serving pytorch ray transformer vllm
Language:Python 29
AntonioGr7 / pratical-llms
A collection of hand on notebook for LLMs practitioner
genai llm llm-evaluation llm-inference llm-serving llm-training quantization
Language:Jupyter Notebook 28
ray-project / llms-in-prod-workshop-2023
Deploy and Scale LLM-based applications
anyscale llm llm-inference llm-serving llms ray
Language:Jupyter Notebook 23
OSS-Pole-Emploi / happy_vllm
A REST API for vLLM, production ready
api-rest llm llm-serving mlops production serving transformers vllm
Language:Python 21
oscinis-com / Awesome-LLM-Productization
Awesome-LLM-Productization: a curated list of tools/tricks/news/regulations about AI and Large Language Model (LLM) productization
ai large-language-models llm llm-inference llm-serving llmops machine-learning mlops production transformer awesome awesome-list
19
sugarcane-ai / sugarcane-ai.github.io
api architecture fine-tuning framework generative-ai llm llm-serving llm-training llmops microservices prompt prompt-engineering prompts reusable training typescript workflow
Language:Astro 15
Neural-Dragon-AI / Cynde
A Framework For Intelligence Farming
autoscaling llm-inference llm-serving modal-labs openai-api polars pydantic xgboost intelligence-farming pydantic-logfire
Language:Python 11
ray-project / anyscale-berkeley-ai-hackathon
Ray and Anyscale for UC Berkeley AI Hackathon!
anyscale berkeley-ai hackathon llm llm-inference llm-serving ray-distributed
Language:Jupyter Notebook 11
Awesome-LLMs-ICLR-24
azminewasi / Awesome-LLMs-ICLR-24
It is a comprehensive resource hub compiling all LLM papers accepted at the International Conference on Learning Representations (ICLR) in 2024.
large-language-model large-language-models large-language-models-and-translation-systems large-language-models-for-graph-learning llm llm-agent llm-evaluation llm-framework llm-inference llm-privacy llm-prompting llm-security llm-serving llm-training llmops llms pretrained-language-model pretrained-models pretrained-weights
10
ein-llm
ehsanghaffar / ein-llm
A self-hosted personal chatbot API with FastAPI. It allows you to interact with the Llama2 LLM (and other open-source LLMs) to have natural language conversations, generate text, and perform various language-related tasks.
langchain llama2 llamacpp llm-serving
Language:Jupyter Notebook 10
mddunlap924 / LLM-Inference-Serving
This repository demonstrates LLM execution on CPUs using packages like llamafile, emphasizing low-latency, high-throughput, and cost-effective benefits for inference and serving.
deepspeed large-language-models llamacpp llamafile llm-inference llm-serving llms vllm
Language:Jupyter Notebook 8
IvanLuLyf / bunny-llm
Deno LLM API Service
chatgpt chatgpt-api cloudflare-workers-ai llm llm-serving
Language:TypeScript 6
ray-project / llm-application
anyscale llm llm-serving ray-distributed nlp scalable-machine-learning
Language:Jupyter Notebook 6

llm-serving

ray-project / ray

vllm-project / vllm

bentoml / OpenLLM

bentoml / BentoML

liguodongiot / llm-action

skypilot-org / skypilot

SuperDuperDB / superduperdb

microsoft / aici

predibase / lorax

ray-project / ray-llm

mosecorg / mosec

hpcaitech / SwiftInfer

alibaba / rtp-llm

rohan-paul / LLM-FineTuning-Large-Language-Models

ray-project / ray-educational-materials

substratusai / runbooks

chenhunghan / ialacol

slai-labs / get-beam

galeselee / Awesome_LLM_System-PaperList

HPMLL / BurstGPT

sugarcane-ai / sugarcane-ai

mani-kantap / llm-inference-solutions

friendliai / friendli-client

asprenger / ray_vllm_inference

AntonioGr7 / pratical-llms

ray-project / llms-in-prod-workshop-2023

OSS-Pole-Emploi / happy_vllm

oscinis-com / Awesome-LLM-Productization

sugarcane-ai / sugarcane-ai.github.io

Neural-Dragon-AI / Cynde

ray-project / anyscale-berkeley-ai-hackathon

azminewasi / Awesome-LLMs-ICLR-24

ehsanghaffar / ein-llm

mddunlap924 / LLM-Inference-Serving

IvanLuLyf / bunny-llm

ray-project / llm-application