vllm

There are 2 repositories under vllm topic.

meta-llama / llama-recipes
Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama3 for WhatsApp & Messenger.
ai finetuning langchain llama llama2 llm machine-learning python pytorch vllm
Language:Jupyter Notebook 10333
xorbitsai / inference
Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
artificial-intelligence chatglm deployment flan-t5 gemma ggml glm4 inference llama llama3 llamacpp llm machine-learning mistral openai-api pytorch qwen vllm whisper wizardlm
Language:Python 3501
DefTruth / Awesome-LLM-Inference
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
flash-attention flash-attention-2 paged-attention streaming-llm tensorrt-llm vllm awesome-llm sora llm llm-inference llms deepseek open-sora
1868
OpenLLMAI / OpenRLHF
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & Mixtral)
deepspeed transformers vllm large-language-models raylib reinforcement-learning-from-human-feedback reinforcement-learning
Language:Python 1702
atfortes / Awesome-LLM-Reasoning
Reasoning in Large Language Models: Papers and Resources, including Chain-of-Thought, Instruction-Tuning and Multimodality.
awesome chain-of-thought chatgpt cot gpt gpt-4 in-context-learning language-models mllm multimodal papers prompt prompt-engineering question-answering reasoning vllm
1254
bricks-cloud / BricksLLM
🔒 Enterprise-grade API gateway that helps you monitor and impose cost or rate limits per API key. Get fine-grained access control and monitoring per user, application, or environment. Supports OpenAI, Azure OpenAI, Anthropic, vLLM, and open-source LLMs.
golang llm openai ai anthropic azure gpt postgresql rest-api ycombinator api docker privacy security artificial-intelligence generative-ai open-source self-hosted vllm
Language:Go 817
prometheus-eval / prometheus-eval
Evaluate your LLM's response with Prometheus and GPT4 💯
evaluation gpt4 litellm llm llm-as-a-judge llm-as-evaluator llmops python vllm
Language:Python 667
varunshenoy / super-json-mode
Low latency JSON generation using LLMs ⚡️
huggingface-transformers llm openai vllm
Language:Jupyter Notebook 361
runpod-workers / worker-vllm
The RunPod worker template for serving our large language model endpoints. Powered by vLLM.
language-model llm runpod vllm
Language:Python 181
gotzmann / booster
Booster - open platform for serving LLM models
llm chatgpt gpt llama openai llama-cpp ggml exllama vllm llamacpp oobabooga ollama
Language:C++ 131
microsoft / vidur
A large-scale simulation framework for LLM inference
inference llm simulation transformer vllm
Language:Python 122
jasonacox / TinyLLM
Setup and run a local LLM and Chatbot using consumer grade hardware.
artificial-intelligence chatbot large-language-models llama-cpp-python llm openai rag retrieval-augmented-generation vllm
Language:JavaScript 120
nbasyl / DoRA
Official implementation of "DoRA: Weight-Decomposed Low-Rank Adaptation"
dora lora peft-fine-tuning-llm vllm
114
Trainy-ai / llm-atc
Fine-tuning and serving LLMs on any cloud
finetuning llama2 llms vllm
Language:Python 83
wangcx18 / llm-vscode-inference-server
An endpoint server for efficiently serving quantized open-source LLMs for code.
llm llm-inference vllm vscode-extension
Language:Python 48
OpenCSGs / llm-inference
llm-inference is a platform for publishing and managing llm inference, providing a wide range of out-of-the-box features for model deployment, such as UI, RESTful API, auto-scaling, computing resource management, monitoring, and more.
deepspeed llama-cpp llm-inference ray transformer vllm
Language:Python 46
asprenger / ray_vllm_inference
A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.
inference llm llm-serving llmops mlops model-serving pytorch ray transformer vllm
Language:Python 36
France-Travail / happy_vllm
A REST API for vLLM, production ready
api-rest llm llm-serving mlops production serving transformers vllm
Language:Python 32
gameofdimension / vllm-cn
演示 vllm 对中文大语言模型的神奇效果
vllm
Language:Jupyter Notebook 31
phospho-app / fastassert
Dockerized LLM inference server with constrained output (JSON mode), built on top of vLLM and outlines. Faster, cheaper and without rate limits. Compare the quality and latency to your current LLM API provider.
llm llm-inference docker outlines vllm
Language:Jupyter Notebook 26
yoziru / nextjs-vllm-ui
Fully-featured, beautiful web interface for vLLM - built with NextJS.
ai llm-ui llm-webui nextjs openai-api self-hosted tailwindcss typescript ui vllm vllm-ui webui
Language:TypeScript 22
YY0649 / ICE-PIXIU
ICE-PIXIU：A Cross-Language Financial Megamodeling Framework
nlp internlm large-language-models llama pixiu vllm
Language:Python 15
zRzRzRzRzRzRzR / lm-fly
大模型推理框架加速，让 LLM 飞起来
llm llm-inference mlx openvino tensorrt-llm tgi vllm
Language:Python 14
Climatik-Project / Climatik-Project
Carbon Limiting Auto Tuning for Kubernetes
keda kepler kubernetes llm llm-inference power-capping kserve vllm kubernetes-operator green-computing sustainability
Language:Go 12
leoguillaume / vLLMembeddings
Deployment a light and full OpenAI API for production with vLLM to support /v1/embeddings with all embeddings models.
embeddings huggingface llm openai text-embeddings-inference vllm
Language:Python 10
joydeb28 / llm-lab
LLM, Fine Tuning, Llama 2, Gemma, Mixtral, vLLM, LangChain, RAG, ChromaDB, FAISS
finetune-llm gemma genai llama2 llm mixtral nlp openllm vllm langchain rag chromadb faiss
Language:Jupyter Notebook 9
mddunlap924 / LLM-Inference-Serving
This repository demonstrates LLM execution on CPUs using packages like llamafile, emphasizing low-latency, high-throughput, and cost-effective benefits for inference and serving.
deepspeed large-language-models llamacpp llamafile llm-inference llm-serving llms vllm
Language:Jupyter Notebook 8
blib-la / ask-poddy
Ask Poddy: Run Open Source LLMs and Embeddings as OpenAI-Compatible Serverless Endpoints (Tutorial)
embedding endpoint infinity llm openai runpod serverless vllm worker ai nextjs rag
Language:TypeScript 7
iNeil77 / vllm-code-harness
Run code inference-only benchmarks quickly using vLLM
code-generation nlp-machine-learning transformers vllm
Language:Python 7
LLM-inference-router / vllm-router
vLLM Router
huggingface kubernetes llama2 llm llm-inference vllm
Language:Python 7
ivangabriele / docker-functionary
Ready-to-deploy Docker image for Functionary LLM served as an OpenAI-Compatible API.
ai docker docker-hub docker-image functions large-language-models llama2 llm openai openai-api server vllm functionary
Language:Dockerfile 5
ivangabriele / docker-llm
Pre-loaded LLMs served as an OpenAI-Compatible API via Docker images.
llm llms api docker openai openai-api openorca orca server vicuna vllm lmsys runpod vast llong docker-image
Language:Dockerfile 5
kyegomez / SimpleUnet
An simple implementation of Unet because all the implementations i've seen are wayy tooo complicated.
artificial-intelligence biomedical biomedical-image-processing computer-vision gpt4 image image-classification image-segmentation texttovide unet vllm
Language:Python 5
sasha0552 / vllm-ci
CI scripts designed to build a Pascal-compatible version of vLLM.
nvidia vllm
Language:Python 5
esmailza / Llama2-vLLM-LangChain-knowledge-graph
Preserving entities through the integration of knowledge graphs, Llama 2, vLLM, and LangChain.
distributed information-extraction knowledge-graph langchain llama2 named-entity-recognition python summarization vllm
Language:Python 4
lucataco / cog-Hermes-2-Pro-Llama-3-8B
Cog wrapper for NousResearch/Hermes-2-Pro-Llama-3-8B
cog vllm
Language:Python 3

vllm

meta-llama / llama-recipes

xorbitsai / inference

DefTruth / Awesome-LLM-Inference

OpenLLMAI / OpenRLHF

atfortes / Awesome-LLM-Reasoning

bricks-cloud / BricksLLM

prometheus-eval / prometheus-eval

varunshenoy / super-json-mode

runpod-workers / worker-vllm

gotzmann / booster

microsoft / vidur

jasonacox / TinyLLM

nbasyl / DoRA

Trainy-ai / llm-atc

wangcx18 / llm-vscode-inference-server

OpenCSGs / llm-inference

asprenger / ray_vllm_inference

France-Travail / happy_vllm

gameofdimension / vllm-cn

phospho-app / fastassert

yoziru / nextjs-vllm-ui

YY0649 / ICE-PIXIU

zRzRzRzRzRzRzR / lm-fly

Climatik-Project / Climatik-Project

leoguillaume / vLLMembeddings

joydeb28 / llm-lab

mddunlap924 / LLM-Inference-Serving

blib-la / ask-poddy

iNeil77 / vllm-code-harness

LLM-inference-router / vllm-router

ivangabriele / docker-functionary

ivangabriele / docker-llm

kyegomez / SimpleUnet

sasha0552 / vllm-ci

esmailza / Llama2-vLLM-LangChain-knowledge-graph

lucataco / cog-Hermes-2-Pro-Llama-3-8B