datastaxdevs / terraform-nvidia-runai-stack

Repository from Github https://github.comdatastaxdevs/terraform-nvidia-runai-stackRepository from Github https://github.comdatastaxdevs/terraform-nvidia-runai-stack

DataStax Vector Pipeline — RAG and Document AI

Python 3.11+ License pre-commit Ruff Conventional Commits SemVer Terraform Kubernetes GKE NVIDIA NIMs Astra DB Docs Security Policy Changelog Docker Helm Chart FastAPI Azure OpenAI Arize (optional)

Build a production-grade, GPU-ready RAG platform that turns your documents into answers — end to end, in minutes.

Why this repo

  • Spin up multi-cloud GPU infrastructure (GKE/EKS/AKS) with sane defaults and HPA profiles
  • Ingest, chunk, and embed with NVIDIA NIMs; store vectors in Astra DB/HCD; search at scale
  • Serve retrieval APIs with cross-encoder reranking and observability built in
  • CLI-first operations: deploy, validate, monitor, and optimize in a few commands

TL;DR (60 seconds)

# 1) Bootstrap env
cp env.template .env  # set NGC_API_KEY, GCP_PROJECT_ID, GCP_ZONE, GKE_CLUSTER_NAME

# 2) Provision (example: GKE)
terraform -chdir=deployment apply \
  -var gcp_project_id="$GCP_PROJECT_ID" \
  -var gcp_zone="$GCP_ZONE" \
  -var gke_name_prefix="$GKE_CLUSTER_NAME"

# 3) One-command dev deploy (DNS-free)
./scripts/cli.sh deploy --profile dev
./scripts/cli.sh status --extended

What you get:

  • NVIDIA NIMs (embedder + reranker) deployed and wired to ingress
  • Retrieval stack with clean APIs, reranking, and performance timing
  • Terraform-managed infrastructure and day-2 scripts (monitoring, tuning)
  • Clear docs and examples to move from POC → production

Quick links: Docs · GKE Deploy · Scripts


🎯 Deployment Overview

This project uses a two-phase deployment approach:

  1. 🏗️ Infrastructure First: Use Terraform to create the Kubernetes cluster and supporting infrastructure
  2. 🚀 Applications Second: Use the CLI to deploy applications onto the existing infrastructure

Key Point: You must run Terraform commands BEFORE running CLI deployment commands. The CLI deploys applications onto infrastructure that Terraform creates.

Phase 1: Infrastructure Provisioning (Terraform)

Creates the cloud infrastructure (Kubernetes cluster, networking, bastion host, etc.)

Phase 2: Application Deployment (CLI)

Deploys applications and services onto the existing infrastructure


📋 Prerequisites

  • gcloud, terraform, kubectl, helm, jq installed and authenticated
  • Copy env.template to .env and fill any required values (minimal for dev)
  • Required: Run source scripts/setup_environment.sh to load environment variables before infrastructure deployment

Domain configuration for Run:AI:

  • Production: set RUNAI_DOMAIN to a DNS name you control and create a DNS record pointing to your ingress LoadBalancer.
  • Development (no DNS): use an IP-based hostname like runai.<LOAD_BALANCER_IP>.sslip.io (or nip.io). The installer uses this domain for Run:AI ingress automatically if none is provided.
  • Precedence: CLI/env values override .env.

🏗️ Phase 1: Infrastructure Provisioning

Purpose: Create the Kubernetes cluster and supporting infrastructure

# 1. Set up environment
cp env.template .env
# Edit .env with your values - Required variables:
# - NGC_API_KEY=nvapi-...                    # NVIDIA API key for NeMo services
# - GCP_PROJECT_ID=your-gcp-project         # Your Google Cloud project ID
# - GCP_ZONE=us-central1-c                  # GCP zone for resources
# - GKE_CLUSTER_NAME=your-cluster-name      # Name for your GKE cluster
# 
# Optional but recommended for databases:
# - ASTRA_DB_ENDPOINT=                       # DataStax Astra DB endpoint
# - ASTRA_DB_TOKEN=                          # DataStax Astra DB token
# - HCD_DB_ENDPOINT=                         # HyperConverged Database endpoint  
# - HCD_DB_TOKEN=                            # HyperConverged Database token
#
# For troubleshooting terminal crashes:
# - TERRAFORM_DEBUG=true                     # Enable verbose terraform logging

source scripts/setup_environment.sh  # Load environment variables and setup

# 2. Provision infrastructure with Terraform
cd deployment
terraform init
terraform apply

# Note: If you experience terminal crashes during terraform operations, 
# add TERRAFORM_DEBUG=true to your .env file and re-run source scripts/setup_environment.sh

# 3. Verify infrastructure and test cluster access
cd ..
bastion kubectl get nodes            # Test cluster access

What this creates:

  • GKE cluster with GPU support
  • Bastion host for secure cluster access
  • VPC, subnets, and security groups
  • IAM roles and service accounts
  • Load balancer infrastructure

🔧 Environment Setup Script

The scripts/setup_environment.sh script provides several benefits:

  • Environment Loading: Automatically loads variables from your .env file
  • Bastion Function: Creates a convenient bastion command for cluster access
  • Authentication Check: Verifies your gcloud authentication status
  • Connectivity Test: Tests bastion connectivity (if infrastructure is deployed)
  • Helpful Tips: Provides usage examples and next steps

Usage: source scripts/setup_environment.sh (run once per terminal session)

🔗 Accessing Your Cluster via Bastion

The setup script creates a convenient bastion function for executing commands on your cluster. Here are the different ways to access your cluster:

Method 1: Using the Bastion Function (Recommended)

# Setup environment (run once per session)
source scripts/setup_environment.sh

# Execute commands on the cluster
bastion kubectl get nodes
bastion kubectl get pods --all-namespaces
bastion "kubectl describe nodes | grep nvidia"

Method 2: Direct SSH Connection

# Get bastion details
BASTION_NAME=$(terraform -chdir=deployment output -raw gke_bastion_name)
PROJECT_ID=$(terraform -chdir=deployment output -raw gcp_project_id)
ZONE=$(terraform -chdir=deployment output -raw gcp_zone)

# SSH to bastion
gcloud compute ssh --project $PROJECT_ID --zone $ZONE $BASTION_NAME

# Then run kubectl commands inside the SSH session
kubectl get nodes

Method 3: SSH with Inline Commands

# Execute single commands
gcloud compute ssh --project $PROJECT_ID --zone $ZONE $BASTION_NAME --command="kubectl get nodes"

Method 4: Manual gcloud SSH (if you know the details)

# Replace with your actual values
gcloud compute ssh --project gcp-lcm-project --zone us-central1-c vz-mike-obrien-bastion --command="kubectl get nodes"

💡 Pro Tips:

  • Use Method 1 (bastion function) for the best experience
  • The setup script automatically detects your infrastructure configuration
  • Use quotes for complex commands: bastion "kubectl get pods | grep nemo"
  • Run source scripts/setup_environment.sh in each new terminal session

🚀 Phase 2: Application Deployment

Purpose: Deploy applications onto the existing infrastructure

Development (DNS-free, quick start):

./scripts/cli.sh deploy --profile dev

# If the CLI seems to hang after "Loading environment...",
# skip local platform detection (e.g., when local kubectl is stale):
./scripts/cli.sh deploy --profile dev --platform gke

# Or explicitly disable detection:
./scripts/cli.sh deploy --profile dev --no-detect

Production (with domain/email):

./scripts/cli.sh deploy --profile prod --domain your-domain.com --email admin@your-domain.com

What this deploys:

  • GPU Operator for NVIDIA GPU management
  • NVIDIA NIMs (embedder + reranker) for AI services
  • NV-Ingest for document processing
  • NGINX ingress controller
  • TLS certificates (production only)

✅ Phase 3: Validate & Operate

# Check deployment status
./scripts/cli.sh status --extended
./scripts/cli.sh nims status
./scripts/cli.sh ingress status

# Validate everything is working
./scripts/cli.sh validate

# Access services (development)
./scripts/cli.sh port-forward                # dev/no DNS access
./scripts/cli.sh monitor lb --watch         # load balancing/HPA view

Access your services:

  • Reranker: http://reranker.<LOAD_BALANCER_IP>.nip.io (dev)
  • NV-Ingest: http://nv-ingest.<LOAD_BALANCER_IP>.nip.io (dev)
  • Production: Use your configured domain with TLS

⚡ Quick Reference (For Experienced Users)

If you're familiar with the process, here's the essential sequence:

# 1. Infrastructure
cp env.template .env && # edit .env (see Environment Configuration section for required values)
source scripts/setup_environment.sh && cd deployment && terraform init && terraform apply

# 2. Applications  
cd .. && ./scripts/cli.sh deploy --profile dev

# 3. Validate
./scripts/cli.sh validate && ./scripts/cli.sh status

📖 Detailed Setup Guide (Alternative)

For a more detailed walkthrough with troubleshooting tips, see the comprehensive setup guide below.

Prerequisites Installation

# macOS
brew install --cask google-cloud-sdk || true
brew install terraform || true
gcloud auth login
gcloud auth application-default login

# Install kubectl and helm (see platform-specific instructions below)

Environment Configuration

cp env.template .env
# Edit .env and set required values:

# --- Required for Infrastructure ---
NGC_API_KEY=nvapi-...                    # NVIDIA API key from NGC
GCP_PROJECT_ID=your-gcp-project         # Your Google Cloud project ID  
GCP_ZONE=us-central1-c                  # GCP zone (or us-east1-b, etc.)
GKE_CLUSTER_NAME=your-cluster-name      # Name for your GKE cluster

# --- Required for Database (choose one) ---
# For DataStax Astra:
ASTRA_DB_ENDPOINT=https://your-db-id-region.apps.astra.datastax.com
ASTRA_DB_TOKEN=AstraCS:...

# OR for HyperConverged Database:
HCD_DB_ENDPOINT=https://your-hcd-endpoint
HCD_DB_TOKEN=your-hcd-token

# --- Optional Troubleshooting ---
TERRAFORM_DEBUG=true                     # Enable if experiencing terminal crashes

Alternative: Using tfvars file

# Instead of command-line variables, use a tfvars file:
terraform -chdir=deployment apply -var-file=../configs/gke/gke.tfvars

Legacy Deployment Methods

For advanced users or troubleshooting, you can use the underlying scripts directly:

# Development profile (GPU Operator + NeMo; DNS-free)
scripts/platform/gke/deploy_to_bastion.sh --development

# NV‑Ingest only (through bastion)
scripts/platform/gke/deploy_to_bastion.sh --deploy-nv-ingest

# Run:AI guided setup (optional)
scripts/cli.sh runai setup

Tips:

  • CLI flags override environment variables and .env values
  • Keep .env:GKE_CLUSTER_NAME consistent with Terraform's gke_name_prefix
  • Prefer running Kubernetes commands via the bastion resolved from Terraform outputs
  • You can override the Terraform directory with TF_ROOT if needed

📐 Deployment Profiles & Sizing (Guidance)

Profile GPUs (A100 equiv) Nodes Expected Throughput Notes
dev 0-2 1-2 low DNS-free, quick validation
prod-small 4-8 3-6 medium HA, controller+ingress tuned
prod-large 8-16+ 6-12 high HPA aggressive, multi-NIM scaling
  • Actual throughput varies by document mix and GPU types. See deploy_nv_ingest.sh for optimized presets and optimize_nv_ingest_performance.sh for day-2 tuning.

📚 Documentation (Full)

📚 Complete Documentation - Full project documentation with navigation

Choose your component to get started:

Quick links:

  • Scripts Quickstart: scripts/README.md
  • GKE Deployment Guide: docs/deployment/gke-deployment.md

Runtime data directories

The ingestion pipeline writes runtime data to several directories. You can colocate them under a single base directory via --data-dir (CLI) or DATA_DIR (env). CLI values override env, which override defaults.

Defaults (when no base is provided) follow OS conventions via platformdirs:

  • processed/error: user data dir
  • checkpoints: user state dir
  • temp/downloads: user cache dir

Optional env overrides for individual paths (CLI still wins): OUTPUT_DIR, ERROR_DIR, TEMP_DIR, CHECKPOINT_DIR, DOWNLOAD_DIR.

Suggested recipes:

  • Development: DATA_DIR=.data to keep the repo clean
  • Production (Linux): DATA_DIR=/var/lib/datastax-ingestion

🏛️ System Architecture

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   V2 Pipeline   │    │ Retrieval RAG   │    │ Infrastructure  │
│                 │    │                 │    │                 │
│ • Document Proc │    │ • Vector Search │    │ • Terraform     │
│ • NV-Ingest     │    │ • Azure OpenAI  │    │ • Kubernetes    │
│ • Vector DB     │    │ • Astra DB      │    │ • Monitoring    │
│ • Azure Blob    │    │ • Reranking     │    │ • Scripts       │
└─────────────────┘    └─────────────────┘    └─────────────────┘
         │                       │                       │
         └───────────────────────┼───────────────────────┘
                                 │
                    ┌─────────────────┐
                    │  NeMo Services  │
                    │                 │
                    │ • Microservices │
                    │ • GPU Accel     │
                    │ • Embeddings    │
                    │ • Reranking     │
                    └─────────────────┘

🎯 Use Cases

📄 Enterprise Document Processing

  • Large-scale ingestion from multiple sources (local, Azure Blob, etc.)
  • Intelligent chunking and embedding generation using NVIDIA models
  • Vector database storage with Astra DB or HCD for semantic search
  • Enterprise security with comprehensive SSL/TLS support

🔍 Intelligent Q&A Systems

  • Semantic document search with vector similarity and reranking
  • AI-powered answers using Azure OpenAI GPT-4o with streaming responses
  • Production monitoring with Arize integration and detailed metrics
  • Multi-database support for both cloud and on-premises deployments

🏗️ Infrastructure Automation

  • Multi-cloud deployment across AWS, GCP, and Azure platforms
  • Kubernetes orchestration with GPU support and auto-scaling
  • Terraform modules for reproducible infrastructure provisioning
  • Comprehensive monitoring with health checks and diagnostics

📋 Prerequisites

System Requirements

  • Python: 3.8 or higher
  • Kubernetes: 1.20+ with GPU support
  • Terraform: 1.0+ for infrastructure automation
  • Docker: For containerized deployments

Cloud Services

  • Vector Database: DataStax Astra DB or HCD
  • GPU Compute: NVIDIA GPU-enabled clusters
  • Object Storage: Azure Blob Storage, AWS S3, or Google Cloud Storage
  • AI Services: NVIDIA API keys, Azure OpenAI deployments

Platform-Specific Installation

macOS

# Install Homebrew (if not already installed)
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

# Install required tools
brew install --cask google-cloud-sdk
brew install --cask hashicorp/tap/terraform  # Use HashiCorp tap for latest version
brew install kubectl helm jq

# Authenticate with Google Cloud
gcloud auth login
gcloud auth application-default login

Linux (Ubuntu/Debian)

# Update package list
sudo apt-get update

# Install required tools
sudo apt-get install -y apt-transport-https ca-certificates gnupg curl wget jq

# Install Google Cloud SDK
curl https://sdk.cloud.google.com | bash
exec -l $SHELL

# Install Terraform
wget -O- https://apt.releases.hashicorp.com/gpg | gpg --dearmor | sudo tee /usr/share/keyrings/hashicorp-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/hashicorp.list
sudo apt-get update && sudo apt-get install terraform

# Install kubectl
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl

# Install Helm
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash

# Authenticate with Google Cloud
gcloud auth login
gcloud auth application-default login

Linux (CentOS/RHEL/Fedora)

# Install required tools
sudo dnf install -y curl wget jq  # or yum for older versions

# Install Google Cloud SDK
curl https://sdk.cloud.google.com | bash
exec -l $SHELL

# Install Terraform
sudo dnf install -y dnf-plugins-core
sudo dnf config-manager --add-repo https://rpm.releases.hashicorp.com/RHEL/hashicorp.repo
sudo dnf install -y terraform

# Install kubectl
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl

# Install Helm
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash

# Authenticate with Google Cloud
gcloud auth login
gcloud auth application-default login

Windows

# Install Chocolatey (if not already installed)
Set-ExecutionPolicy Bypass -Scope Process -Force; [System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072; iex ((New-Object System.Net.WebClient).DownloadString('https://community.chocolatey.org/install.ps1'))

# Install required tools
choco install -y gcloudsdk terraform kubernetes-cli kubernetes-helm jq

# Authenticate with Google Cloud
gcloud auth login
gcloud auth application-default login

Note: After installation, verify all tools are available and meet version requirements:

gcloud --version          # No specific version requirement (latest recommended)
terraform --version       # >= 1.9.0, < 2.0.0 (see deployment/modules/kubernetes/versions.tf)
kubectl version --client  # Compatible with Kubernetes 1.20+ (see System Requirements)
helm version              # ~> 2.17.0 (see deployment/modules/kubernetes/versions.tf)
jq --version              # No specific version requirement (latest recommended)

Supported Version Ranges:

  • Terraform: 1.9.0+ (required by infrastructure modules)
  • Kubernetes: 1.20+ with GPU support
  • Python: 3.11+ (required by ingestion and retrieval packages)
  • Helm: 2.17.0+ (required by Terraform providers)

🧰 Helpful CLI Examples

# Controller patch (idempotent)
./scripts/cli.sh nginx controller --yes

# Ingress high throughput profile for all
./scripts/cli.sh nginx ingress --target all --profile high_throughput --yes

# Apply whitelist (CIDR CSV)
./scripts/cli.sh ingress whitelist --allowed-ips "1.2.3.4/32,5.6.7.0/24" --yes

# Validate deployment
./scripts/cli.sh validate

Note: default ingress upload limit is 3g. Override via INGRESS_MAX_BODY_SIZE env or the corresponding CLI flags.

🏢 Run:ai Administration

Infrastructure Administrator Overview

The Infrastructure Administrator is an IT person responsible for the installation, setup and IT maintenance of the Run:ai product.

As part of the Infrastructure Administrator documentation you will find:

  • Install Run:ai
    • Understand the Run:ai installation
    • Set up a Run:ai Cluster
    • Set up Researchers to work with Run:ai
  • IT Configuration of the Run:ai system
  • Connect Run:ai to an identity provider
  • Maintenance & monitoring of the Run:ai system
  • Troubleshooting

For comprehensive Run:ai administration documentation, visit: NVIDIA Run:ai Infrastructure Administrator Guide

Note: The NVIDIA Run:ai docs are moving! For versions 2.20 and above, visit the new NVIDIA Run:ai documentation site. Documentation for versions 2.19 and below remains on the original site.

Run:ai Setup with this Project

This project includes integrated Run:ai deployment capabilities:

# Run:AI guided setup (optional)
./scripts/cli.sh runai setup

The setup process will guide you through configuring Run:ai for your specific infrastructure and requirements.

📚 Documentation Structure

docs/
├── README.md                    # Master documentation hub
├── components/                  # Component-specific guides
│   ├── v2-pipeline/            # Document processing pipeline
│   ├── retrieval-rag/          # Intelligent Q&A system
│   ├── infrastructure/         # Terraform & Kubernetes
│   └── nemo/                   # NVIDIA NeMo services
├── deployment/                 # Platform deployment guides
├── troubleshooting/            # Common issues & solutions
└── archive/                    # Historical documentation

🔐 Security & Enterprise Features

  • SSL/TLS encryption for all service communications
  • Certificate management with custom CA support
  • Network isolation using VPCs and security groups
  • Access control with RBAC and service accounts
  • Audit logging for compliance and monitoring
  • Data residency controls for sensitive information

📊 Monitoring & Observability

  • Performance metrics with millisecond-precision timing
  • Health checks for all system components
  • Resource utilization tracking and optimization
  • Arize AI integration for production monitoring
  • Custom dashboards for specific use cases
  • Error reporting and automated alerting

🤝 Contributing

We welcome contributions! Please see our component-specific documentation for detailed guidelines:

  • Documentation: Follow the component-based organization in docs/
  • Code: See individual component READMEs for specific guidelines
  • Testing: Comprehensive test suites available for all components
  • Issues: Use GitHub issues for bug reports and feature requests

📄 License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

🙏 Acknowledgments

  • DataStax: Astra DB vector database platform and enterprise support
  • NVIDIA: NeMo microservices, GPU acceleration, and AI model APIs
  • Microsoft: Azure OpenAI services and cloud infrastructure
  • Community: Open source contributors and enterprise partners

📖 Additional Resources

Ready to get started? Visit our Documentation Hub for complete setup guides and examples.

About

License:Apache License 2.0


Languages

Language:Shell 61.4%Language:Python 20.2%Language:HCL 12.5%Language:Jupyter Notebook 5.7%Language:Dockerfile 0.1%Language:Makefile 0.0%Language:Smarty 0.0%