MinBZK / ai-validation-infra

K8s infra for AI Validation team

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

AI Validation Infra

GitHub Actions Workflow Status Quality Gate Status Security Rating

This is the infra repository deploying to a kubernetes cluster.

Note: Changes to this repo's MAIN branch are deployed to kubernetes.

How to contribute

See contributing docs

Secrets management

Secrets are managed with SOPS. Sops allows encryption of yaml (and other) files with a public key. A private key that is set in the CI/CD can decrypt the secrets.

sops --encrypt -i apps/tad/overlays/production/secret-postgres.yaml
sops --decrypt -i apps/tad/overlays/production/secret-postgres.yaml

By default sops looks in the .sops.yaml to get the public key to encrypt the files.

Deployment

The main branch is deployed to kubernetes with Flux

Labels

When you have a lot of resources it is important to label all your kubernetes resources because else the resources becomes un-managable. We use the kubernetes best practices for labbeling.

Kubernetes (digilab)

Every kubernetes has a slightly different setup and services available. We are currenlty working on the digilab cloud. They have the following capabilities added:

  • cert-manager for tls certificate management
  • flux for gitops deployment
  • grafana for a metrics dashboard
  • loki for logging collecter
  • pinniped for authentication to kubernetes
  • treafik as ingress controller
  • cloudnativePG for postgres databases
  • sops for secret encryption
  • prometheus operator (PodMonitor & Alertmanager)

Access

To get access you need a pleio account with the correct permissions and pinniped installed.

to install pinniped follow pinniped install tutorial. To get correct access from your pleio account ask a collegue.

namespaces

The AI Validation team has access to the following namespaces:

  • tn-ai-validation-grafana: grafana dashboard for our team (managed by digilab)
  • tn-ai-validation-infra. general infra not managed by flux. currenlty runs vault.
  • tn-ai-validation-keycloak. keycloak setup
  • tn-ai-validation-llm-benchmarks. Running LLM benchmark software
  • tn-ai-validation-playground. random stuff for fun. can be removed at any moment
  • tn-ai-validation-tad. running tad releases with pgadmin
  • tn-ai-validation-tad-staging. Running tad main branch with pgadmin
  • tn-ai-validation-vault: needs to have vault from tn-ai-validation-infra. migration needed

storage classes

The following storage classes are available for persistent storage

NAME                       PROVISIONER          RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
azurefile                  file.csi.azure.com   Delete          Immediate              true                   364d
azurefile-csi              file.csi.azure.com   Delete          Immediate              true                   364d
azurefile-csi-nfs          file.csi.azure.com   Delete          Immediate              true                   364d
azurefile-csi-nfs-retain   file.csi.azure.com   Retain          Immediate              true                   350d
azurefile-csi-premium      file.csi.azure.com   Delete          Immediate              true                   364d
azurefile-premium          file.csi.azure.com   Delete          Immediate              true                   364d
default (default)          disk.csi.azure.com   Delete          WaitForFirstConsumer   true                   364d
managed                    disk.csi.azure.com   Delete          WaitForFirstConsumer   true                   364d
managed-csi                disk.csi.azure.com   Delete          WaitForFirstConsumer   true                   364d
managed-csi-premium        disk.csi.azure.com   Delete          WaitForFirstConsumer   true                   364d
managed-premium            disk.csi.azure.com   Delete          WaitForFirstConsumer   true                   364d

About

K8s infra for AI Validation team

License:European Union Public License 1.2


Languages

Language:Python 50.0%Language:Shell 29.9%Language:HCL 17.7%Language:Batchfile 2.4%