danielezonca / caikit-tgis-serving

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Caikit-TGIS-Serving

Caikit-tgis-serving is a combined image that allows users to perform LLM inference.

It consists of several components:

  • TGIS: Serving backend, loads the models, and provides the inference engine
  • Caikit: Wrapper layer that handles the lifecycle of the TGIS process, provides the inference endpoints, and has modules to handle different model types
  • Caikit-nlp: Caikit module that handles NLP style models
  • KServe: Orchestrates model serving for all types of models, servingruntimes implement loading given types of model servers. KServe handles the lifecycle of the deployment object, storage access, networking setup, etc.
  • Service Mesh (istio): Service mesh networking layer, manages traffic flows, enforces access policies, etc.
  • Serverless (knative): Allows for serverless deployments of models

Installation

Prerequisites

  • Openshift Cluster
    • This doc is written based on a ROSA cluster and has been tested with an OSD cluster as well
    • Many of the tasks in this tutorial require cluster-admin permission level (e.g., install operators, set service-mesh configuration, enable http2, etc)
    • 4 CPU and 16 GB memory in a node for inferencing (can be adjusted in servingRuntime deployment)
  • CLI tools
    • oc cli

How to install

The following required operators will be installed as part of the KServe/Caikit/TGIS stack installation instructions.

There are three ways to install the KServe/Caikit/TGIS stack (includes the installation of above-mentioned required operators).

  1. Script-based installation
  2. Manual installation

Demos with LLM model

Architecture of the stack

KServe+Knative+Istio+Caikit_TGIS Diagram

About

License:Apache License 2.0


Languages

Language:Shell 41.2%Language:Makefile 28.0%Language:Dockerfile 18.4%Language:Python 12.4%