OpenAI Compatible API scripts for RunPod vLLM Worker

Installation

Create your RunPod Endpoint

You can click on Serverless vLLM at the top of the Explore Templates page on RunPod.

Be sure to specify your model name and maximum model length. (Refer to the docs).

Install these scripts

git clone https://github.com/ashleykleynhans/runpod-vllm-scripts.git
cd runpod-vllm-scripts
python3 -m venv venv
source venv/bin/activate
pip3 install -r requirements.txt

Set your RunPod API key, RunPod Endpoint ID and Model

Note

The MODEL is the path to the model on Hugging Face Hub and not the full URL. For example, https://huggingface.co/cognitivecomputations/dolphin-2.9.1-llama-3-8b would be cognitivecomputations/dolphin-2.9.1-llama-3-8b.

Copy the file .env.example to a new file called .env.
Edit the .env file and set your RunPod API key, RunPod Endpoint ID, and model path to the model on Hugging Face Hub.
Save the file.

Scripts

Script Name	Description
chat_completions.py	Chat completions WITHOUT streaming
chat_completions_streaming.py	Chat completions WITH streaming
completions.py	Completions WITHOUT streaming
completions_streaming.py	Completions WITH streaming
list_models.py	List Available models

Once you have installed the scripts and activated the env, you can run any of the above scripts, for example:

python3 chat_completions_streaming.py

You obviously need to edit the scripts to set your prompt, message, model, etc.

Appreciate my work?

About

OpenAI Compatible API scripts for RunPod vLLM Worker

GNU General Public License v3.0

Languages

Language:Python 100.0%