kubernetes / kubeadm

Aggregator for issues filed against kubeadm

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Do not write a resolvConf value in the global kubetconfiguration, write it dynamically per node

ilia1243 opened this issue · comments

Is this a BUG REPORT or FEATURE REQUEST?

FEATURE REQUEST

Versions

kubeadm version (use kubeadm version): v1.29.1

Environment:

  • Kubernetes version (use kubectl version): v1.29.1
  • Cloud provider or hardware configuration: bare-metal
  • OS (e.g. from /etc/os-release): Ubuntu 22.04.1 LTS
  • Kernel (e.g. uname -a): 5.15.0-50-generic
  • Container runtime (CRI) (e.g. containerd, cri-o): containerd=1.6.12-0ubuntu1~22.04.3
  • Container networking plugin (CNI) (e.g. Calico, Cilium): calico
  • Others:

What happened?

If kubeadm init node on Ubuntu 20.04 and kubeadm join node on RHEL9, the joining fails with "open /run/systemd/resolve/resolv.conf: no such file or directory" in kubelet logs.

W/A: use patches or delete resolvConf from kubelet-config ConfigMap before joining.

What you expected to happen?

kubeadm init does not write default resolvConf in KubeletConfiguration kubelet-config ConfigMap. Instead, resolvConf is omitted in kubelet-config ConfigMap, and real value in /var/lib/kubelet/config.yaml is calculated dynamically depending on if systemd-resolved service is active.

How to reproduce it (as minimally and precisely as possible)?

See What happened?.

Anything else we need to know?

kubeadm init does not write default resolvConf in KubeletConfiguration. Instead, resolvConf is omitted, and real value in kubelet config.yaml in calculated dynamically depending on if systemd-resolved service is active.

this is intended.

kubeadm will only update the KubeletConfiguration.ResolverConfig field if the systemd-resolved service is active:
https://github.com/kubernetes/kubernetes/blob/master/cmd/kubeadm/app/componentconfigs/kubelet.go#L200-L213
/run/systemd/resolve/resolv.conf is a valid path if systemd-resolved is managing resolv.conf.

$ ls -l /run/systemd/resolve/resolv.conf
-rw-r--r-- 1 systemd-resolve systemd-resolve 786 Mar  4 15:24 /run/systemd/resolve/resolv.conf
$ systemctl status systemd-resolved | grep active
     Active: active (running) since Mon 2024-03-04 15:24:27 EET; 1min 17s ago

if the service is active but the file is missing, then that problem must be fixed.

if the service is not active the kubelet will default the field to /etc/resolv.conf:
https://github.com/kubernetes/kubelet/blob/master/config/v1beta1/types.go#L437-L442

/run/systemd/resolve/resolv.conf is a valid path if systemd-resolved is managing resolv.conf

Please check the What happened?. If different OS are used, systemd-resolved is not managing resolv.conf for RHEL9, but kubelet tries to open the /run/systemd/resolve/resolv.conf.

again, if the service systemd-resolved is enabled the path passed to kubelet should be /run/systemd/resolve/resolv.conf.
if that is not correct on a certain distro, then it's a problem with systemd-resovled on that distro, i'd say.

In the mentioned case systemd-resolved is disabled for RHEL9. Let me rephrase the test case:

  1. Init first Kubernetes node on Ubuntu 20.04. systemd-resolved is active.

    Actual: Kubeadm writes resolvConf: /run/systemd/resolve/resolv.conf in both kubelet-config ConfigMap and in /var/lib/kubelet/config.yaml.

    Proposed: Kubeadm writes resolvConf: /run/systemd/resolve/resolv.conf only in /var/lib/kubelet/config.yaml, but omits the resolvConf property in kubelet-config ConfigMap.

  2. Join second Kubernetes node on RHEL9. systemd-resolved is inactive.

    Actual: Kubeadm writes resolvConf: /run/systemd/resolve/resolv.conf in /var/lib/kubelet/config.yaml using the kubelet-config ConfigMap and kubelet fails.

    Proposed: Since the property is absent in the ConfigMap at step 1, Kubeadm uses the default /etc/resolv.conf in /var/lib/kubelet/config.yaml.

ok, now i understand the problem. this was not clear in your description.

so first of all, most of the users use the same distro or distro family for a single cluster, so kubeadm is correct for these users. over there systemd-resolved is really enabled or not.
if some node does not work with the default kubeletconfiguration then patches should be used. that is the correct solution.

what can be done to make kubeadm better here is to:

  1. don't write any defaults in the kubeletconfiguration about resolvConf
    (move this logic to 2)
    https://github.com/kubernetes/kubernetes/blob/master/cmd/kubeadm/app/componentconfigs/kubelet.go#L200-L213
  2. mutate the kubelet configuration for a given node after it's downloaded:
    https://github.com/kubernetes/kubernetes/blob/master/cmd/kubeadm/app/phases/kubelet/config.go#L49

we are close to code freeze for 1.30. this can be changed for 1.31, but i don't think it should be backported.
we also need to understand if it's going to break existing users in some way.

PRs welcome, explained above:
#3034 (comment)