hurshd0 / vllm-docker-traefik

vLLM with Traefik for reverse proxy and load balancing

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

vLLM + Docker + Traefik

cloud containerized app


Quick guide in deploying production ready Mistral-7B or Mixtral-8x7B vLLM secure server behind Traefik reverse proxy and load balancer. By default I am using HTTPS let's encrypt certificate, has automated renewal and comes with password protected Traefik dashboard.

Note

  • You can disable Traefik dashboard for keep it internal access only
  • You can substitute Let's Encrypt with your own or CloudFlare or third-party for DNS challenge
  • Add, Crowdsec or WAF through Cloudflare to secure your DNS Zone and any bot, or Cybersecurity threats

How to deploy on Public GPU Server with Real Domain Name

Step 1. Install required dependencies

  • Git
  • Docker ~ v24
  • Docker Compose ~ v2
  • CUDA 12 (NVIDIA GPU gen - Ampere, Hopper)

Check below guides in installing Docker and Docker Compose on Ubuntu 20.04

Note: To use NVIDIA GPU you will need docker compose version > v1.28.0, otherwise you will get error

Step 2. Clone the respoistory

git clone https://github.com/hurshd0/vllm-docker-traefik.git && cd vllm-docker-traefik

Step 3. Create .env file

mv .env.example .env
nano .env

Fill in your environment variables

# Traefik settings
ADMIN_EMAIL=<admin@your-domain.com> # e.g. ADMIN_EMAIL=admin@website.com
DOMAIN=<yourwebsite.com> # e.g. DOMAIN=website.com
CERT_RESOLVER=letsencrypt # keep it blank for internal private or local net
TRAEFIK_USER=admin
TRAEFIK_PASSWORD_HASH=<your-password-hash> # e.g. TRAEFIK_PASSWORD_HASH=$2y$10$OfEBpHk52P/5Ad1qzDj79esMnuhaEbV5of7OBTSurzhtSENLeWzAW 

# vLLM settings
MODEL_NAME=<your-huggingface-hub-model> # e.g. MODEL_NAME="mistralai/Mistral-7B-Instruct-v0.1"
HF_TOKEN=<your-huggingface-hub-token> # e.g. HF_TOKEN="hf_XXXXXXXXXXXXXXXXX"

[Optional] Set your own Traefik Dashboard admin password

Note:

  • Install sudo apt update && sudo apt install apache2-utils -y
htpasswd -nBC 10 admin

Get Huggingface Hub Token

  • HF_TOKEN get it from Huggingface Hub

  • You can leave CERT_RESOLVER empty if you want to test for local deployment

CERT_RESOLVER=

Step 4. Configure vLLM Server for SINGLE or MULTI-GPU setup

A. For SINGLE GPU setup

Start vLLM server

docker compose up -d

Check logs

docker compose logs

B. For MULTI-GPU setup serving Mixtral-8x7B

Get NVIDIA Device IDs

$ nvidia-smi -L

Append it in device_ids list

    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              capabilities: [gpu]
              device_ids: ['0'] 

Customize Docker Compose YAML file

Change the docker-compose.yml file in vllm section

    command:
      --model ${MODEL_NAME}
      --tensor-parallel-size 2 # Based on GPU count, should be even number of GPUs
      --load-format pt # needed since both `pt` and `safetensors` are available

About

vLLM with Traefik for reverse proxy and load balancing


Languages

Language:Shell 100.0%