siderolabs / omni

SaaS-simple deployment of Kubernetes - on your own hardware.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[bug] Workers are failing to provision

gerhard opened this issue · comments

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

Given a 5 node cluster, with 1 control plane machine and 4 worker machines, the control plane machine provisions successfully, but the worker machines fail to make any progress. This is what the looks like:

image

Cluster was created with omnictl cluster template sync --file cluster.yaml --verbose

Here is the Omni UI view:
image

This is what the logs for one of the worker machines are showing:

11/03/2024 07:51:01
InvalidArgument [/machine.MachineService/ApplyConfiguration] 1.80544ms unary rpc error: code = InvalidArgument desc = configuration validation failed: 1 error occurred:
11/03/2024 07:51:01
* key/cert combination should not be empty

Here is one of the worker machines configs (they all have the same config):

kind: Machine
name: 2fbf5556-8b2d-48e2-aef3-afd3ba6aeed6
install:
  disk: /dev/sda
patches:
  - name: tailscale
    file: ../patches/tailscale.yaml
  - name: network
    file: patches/ams24-3-network.yaml
  - name: machine-sidecar-containers
    file: ../patches/machine-sidecar-containers.yaml
  - name: kubeprism
    file: ../patches/kubeprism.yaml

I am using:

  • Omni 0.30.0
  • Talos 1.6.6

Expected Behavior

The worker nodes should succeed provisioning.

Steps To Reproduce

omnictl cluster template sync --file cluster.yaml --verbose

What browsers are you seeing the problem on?

No response

Anything else?

No response

The problem was that I had a bunch of control-plane specific cluster patches applied to the entire cluster, including worker nodes. @frezbo spotted it straight away: https://taloscommunity.slack.com/archives/C04D4PDAJT0/p1710145129423209?thread_ts=1710144867.645529&cid=C04D4PDAJT0

image

For future reference, the following cluster.yaml worked for me:

kind: Cluster
name: square-hole-2024-03-08
kubernetes:
  version: v1.29.2
talos:
  version: v1.6.6
features:
  backupConfiguration:
    interval: 24h
  diskEncryption: true
patches:
  - name: kubespan
    file: ../patches/kubespan.yaml
  # https://github.com/siderolabs/omni-feedback/issues/41
  # https://sysctl-explorer.net/vm/oom_kill_allocating_task/
  - name: oom-kill-allocating-task
    file: ../patches/oom-kill-allocating-task.yaml
  # Requires kubelet patch, otherwise it would be GitOps'd
  - name: metrics-server
    file: ../patches/metrics-server.yaml
---
kind: ControlPlane
machines:
  - b7e54219-3754-4b13-b379-8a1ebfc4cbe7 # par24-3
patches:
  # All the following are required for kube-prometheus-stack to access these metrics
  - name: etcd-metrics
    file: ../patches/etcd-metrics.yaml
  - name: kube-proxy-metrics
    file: ../patches/kube-proxy-metrics.yaml
  - name: kube-scheduler-metrics
    file: ../patches/kube-scheduler-metrics.yaml
  - name: kube-controller-manager-metrics
    file: ../patches/kube-controller-manager-metrics.yaml
  - name: sidecar-containers
    file: ../patches/cluster-sidecar-containers.yaml
---
kind: Workers
machines:
  - 2fbf5556-8b2d-48e2-aef3-afd3ba6aeed6 # ams24-3
  - f839fcb3-8c7e-4b9c-b9a9-04ddb307e438 # lon24-3
  - 66a3a5c0-2938-4afc-9762-4c21b20b9b98 # dus24-3
  - ca71bb36-3a0e-4fde-96eb-8db0fc445b2c # war24-3
---
kind: Machine
name: b7e54219-3754-4b13-b379-8a1ebfc4cbe7
install:
  disk: /dev/sda
patches:
  - name: tailscale
    file: ../patches/tailscale.yaml
  - name: network
    file: patches/par24-3-network.yaml
  - name: machine-sidecar-containers
    file: ../patches/machine-sidecar-containers.yaml
  - name: kubeprism
    file: ../patches/kubeprism.yaml
---
kind: Machine
name: 2fbf5556-8b2d-48e2-aef3-afd3ba6aeed6
install:
  disk: /dev/sda
patches:
  - name: network
    file: patches/ams24-3-network.yaml
  - name: tailscale
    file: ../patches/tailscale.yaml
  - name: machine-sidecar-containers
    file: ../patches/machine-sidecar-containers.yaml
  - name: kubeprism
    file: ../patches/kubeprism.yaml
---
kind: Machine
name: f839fcb3-8c7e-4b9c-b9a9-04ddb307e438
install:
  disk: /dev/sda
patches:
  - name: tailscale
    file: ../patches/tailscale.yaml
  - name: network
    file: patches/lon24-3-network.yaml
  - name: machine-sidecar-containers
    file: ../patches/machine-sidecar-containers.yaml
  - name: kubeprism
    file: ../patches/kubeprism.yaml
---
kind: Machine
name: 66a3a5c0-2938-4afc-9762-4c21b20b9b98
install:
  disk: /dev/sda
patches:
  - name: tailscale
    file: ../patches/tailscale.yaml
  - name: network
    file: patches/dus24-3-network.yaml
  - name: machine-sidecar-containers
    file: ../patches/machine-sidecar-containers.yaml
  - name: kubeprism
    file: ../patches/kubeprism.yaml
---
kind: Machine
name: ca71bb36-3a0e-4fde-96eb-8db0fc445b2c
install:
  disk: /dev/sda
patches:
  - name: tailscale
    file: ../patches/tailscale.yaml
  - name: network
    file: patches/war24-3-network.yaml
  - name: machine-sidecar-containers
    file: ../patches/machine-sidecar-containers.yaml
  - name: kubeprism
    file: ../patches/kubeprism.yaml