[bug] Workers are failing to provision
gerhard opened this issue · comments
Is there an existing issue for this?
- I have searched the existing issues
Current Behavior
Given a 5 node cluster, with 1 control plane machine and 4 worker machines, the control plane machine provisions successfully, but the worker machines fail to make any progress. This is what the looks like:
Cluster was created with omnictl cluster template sync --file cluster.yaml --verbose
This is what the logs for one of the worker machines are showing:
11/03/2024 07:51:01
InvalidArgument [/machine.MachineService/ApplyConfiguration] 1.80544ms unary rpc error: code = InvalidArgument desc = configuration validation failed: 1 error occurred:
11/03/2024 07:51:01
* key/cert combination should not be empty
Here is one of the worker machines configs (they all have the same config):
kind: Machine
name: 2fbf5556-8b2d-48e2-aef3-afd3ba6aeed6
install:
disk: /dev/sda
patches:
- name: tailscale
file: ../patches/tailscale.yaml
- name: network
file: patches/ams24-3-network.yaml
- name: machine-sidecar-containers
file: ../patches/machine-sidecar-containers.yaml
- name: kubeprism
file: ../patches/kubeprism.yaml
I am using:
- Omni 0.30.0
- Talos 1.6.6
Expected Behavior
The worker nodes should succeed provisioning.
Steps To Reproduce
omnictl cluster template sync --file cluster.yaml --verbose
What browsers are you seeing the problem on?
No response
Anything else?
No response
The problem was that I had a bunch of control-plane specific cluster
patches applied to the entire cluster, including worker nodes. @frezbo spotted it straight away: https://taloscommunity.slack.com/archives/C04D4PDAJT0/p1710145129423209?thread_ts=1710144867.645529&cid=C04D4PDAJT0
For future reference, the following cluster.yaml
worked for me:
kind: Cluster
name: square-hole-2024-03-08
kubernetes:
version: v1.29.2
talos:
version: v1.6.6
features:
backupConfiguration:
interval: 24h
diskEncryption: true
patches:
- name: kubespan
file: ../patches/kubespan.yaml
# https://github.com/siderolabs/omni-feedback/issues/41
# https://sysctl-explorer.net/vm/oom_kill_allocating_task/
- name: oom-kill-allocating-task
file: ../patches/oom-kill-allocating-task.yaml
# Requires kubelet patch, otherwise it would be GitOps'd
- name: metrics-server
file: ../patches/metrics-server.yaml
---
kind: ControlPlane
machines:
- b7e54219-3754-4b13-b379-8a1ebfc4cbe7 # par24-3
patches:
# All the following are required for kube-prometheus-stack to access these metrics
- name: etcd-metrics
file: ../patches/etcd-metrics.yaml
- name: kube-proxy-metrics
file: ../patches/kube-proxy-metrics.yaml
- name: kube-scheduler-metrics
file: ../patches/kube-scheduler-metrics.yaml
- name: kube-controller-manager-metrics
file: ../patches/kube-controller-manager-metrics.yaml
- name: sidecar-containers
file: ../patches/cluster-sidecar-containers.yaml
---
kind: Workers
machines:
- 2fbf5556-8b2d-48e2-aef3-afd3ba6aeed6 # ams24-3
- f839fcb3-8c7e-4b9c-b9a9-04ddb307e438 # lon24-3
- 66a3a5c0-2938-4afc-9762-4c21b20b9b98 # dus24-3
- ca71bb36-3a0e-4fde-96eb-8db0fc445b2c # war24-3
---
kind: Machine
name: b7e54219-3754-4b13-b379-8a1ebfc4cbe7
install:
disk: /dev/sda
patches:
- name: tailscale
file: ../patches/tailscale.yaml
- name: network
file: patches/par24-3-network.yaml
- name: machine-sidecar-containers
file: ../patches/machine-sidecar-containers.yaml
- name: kubeprism
file: ../patches/kubeprism.yaml
---
kind: Machine
name: 2fbf5556-8b2d-48e2-aef3-afd3ba6aeed6
install:
disk: /dev/sda
patches:
- name: network
file: patches/ams24-3-network.yaml
- name: tailscale
file: ../patches/tailscale.yaml
- name: machine-sidecar-containers
file: ../patches/machine-sidecar-containers.yaml
- name: kubeprism
file: ../patches/kubeprism.yaml
---
kind: Machine
name: f839fcb3-8c7e-4b9c-b9a9-04ddb307e438
install:
disk: /dev/sda
patches:
- name: tailscale
file: ../patches/tailscale.yaml
- name: network
file: patches/lon24-3-network.yaml
- name: machine-sidecar-containers
file: ../patches/machine-sidecar-containers.yaml
- name: kubeprism
file: ../patches/kubeprism.yaml
---
kind: Machine
name: 66a3a5c0-2938-4afc-9762-4c21b20b9b98
install:
disk: /dev/sda
patches:
- name: tailscale
file: ../patches/tailscale.yaml
- name: network
file: patches/dus24-3-network.yaml
- name: machine-sidecar-containers
file: ../patches/machine-sidecar-containers.yaml
- name: kubeprism
file: ../patches/kubeprism.yaml
---
kind: Machine
name: ca71bb36-3a0e-4fde-96eb-8db0fc445b2c
install:
disk: /dev/sda
patches:
- name: tailscale
file: ../patches/tailscale.yaml
- name: network
file: patches/war24-3-network.yaml
- name: machine-sidecar-containers
file: ../patches/machine-sidecar-containers.yaml
- name: kubeprism
file: ../patches/kubeprism.yaml