pulumiverse / katwalk

LLM model runway server

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Katwalk Server - LLM API Server and IaC

This repository is under construction.
Encouragement and or Bugs & PRs for enhancement are welcome!

About:

Katwalk is a LLM model api server.
In this repository you will find all code required to build and deploy a containerized LLM API service.

Deployment options:

  • Locally with Docker
  • Runpod.io GPU Cloud (alpha)
  • Azure on ACI*
  • More to come

Azure ACI containers take ~90 minutes to start for un-diagnosed reasons

Limitations:

  • Katwalk server is a PoC project only at this time.
  • Tested to run hf models from huggingface only.
  • Platform: linux/amd64 only at this time.
  • Requires CUDA support.

Navigate to the ./pulumi directory for instructions

Prompting a katwalk server api with Insomnia and a response generated by the meta-llama/Llama-2-7B-chat-hf

Configuration Index

# secret encryption salt
encryptionsalt: v1:Ce8NM940=:v1:j+XGC73Zqp:t34XyjytvHcK+G1luA==
config:

  #################################################################
  # General Deploy config
  #################################################################
  # Usage Keys:
  #   Keys:
  #     - pulumi config set <key> <value>
  #   Secret Keys:
  #     - pulumi config set --secret <secret_key> <secret_value>

  katwalk:deploy: "True" # Default: False, set to "True" to deploy
  katwalk:runtime: "runpod" # Default: docker, accepts: docker, runpod, azure
  katwalk:deploymentName: "katwalk" # Defaults to "katwalk"

  #################################################################
  # Docker Image Settings
  #################################################################

  # Docker Build & Push Settings
  katwalk:dockerBuild: "False" # Default: False, set to "True" to build container
  katwalk:dockerPush: "False" # Default: False, set to "True" to push container

  # Docker Image Name & Registry Settings
  katwalk:dockerTag: "20230829"
  katwalk:dockerName: "katwalk"
  katwalk:dockerProject: "usrbinkat" # Optional if same as dockerUser
  katwalk:dockerRegistry: "ghcr.io" # accepts: any oci registry service (e.g. ghcr.io, docker.io, etc.)

  # Registry Credentials
  katwalk:dockerUser: "usrbinkat"

  # Optional unless pushing to registry or deploying to Azure
  katwalk:dockerRegistrySecret: # accepts: oci registry api token or password as `pulumi config set --secret dockerRegistrySecret <token>`
    secure: v1:UUt+00x+9Tz1:IVMKQ/2J+5ydq5R/kuFlfp+v5yYDF8HcL3Vy9Vz8nTNKPU=

  #################################################################
  # Huggingface Settings
  #################################################################

  # Huggingface Model ID string
  katwalk:hfModel: "meta-llama/Llama-2-7b-chat-hf" # accepts: any `hf` format Huggingface model ID

  # Huggingface Credentials
  katwalk:hfUser: "usrbinkat" # accepts: Huggingface user name
  katwalk:hfToken: # accepts: Huggingface API auth token as `pulumi config set --secret hfToken <token>`
    secure: v1:5hwLBQ4KO:/DNzuZ7UfbMMOQGxF7d29a+nWm04YSspPDm11y79E=

  #################################################################
  # Docker Deploy Settings
  #################################################################

  # If deploying locally via Docker
  # Default: create & use docker volume
  # Optional: set global local host path to models directory
  katwalk:modelsPath: /home/kat/models

  #################################################################
  # Azure Deploy Settings
  #################################################################

  katwalk:azureAciGpu: "V100" # accepts: V100, K80, P100
  katwalk:azureAciGpuCount: "1" # accepts: 1, 2, 4, 8

  #################################################################
  # Runpod Deploy Settings
  #################################################################

  # Runpod GPU Type
  # List of available GPU types in ./doc/runpod/README.md
  katwalk:runpodGpuType: "NVIDIA RTX A6000" # accepts: any valid Runpod GPU type

  # Runpod Credentials
  katwalk:runpodToken:
    secure: v1:2IqzPVRePwRwz:KbJzp+5L+khtSBbgW6FjPpdCQszP700xJAZVcrg/qBoo/pbgK=

What's Next?

ADDITIONAL RESOURCES:

About

LLM model runway server

License:Apache License 2.0


Languages

Language:Python 80.7%Language:Dockerfile 9.2%Language:Vue 7.9%Language:HTML 1.7%Language:JavaScript 0.6%