Do not write a resolvConf value in the global kubetconfiguration, write it dynamically per node
ilia1243 opened this issue · comments
Is this a BUG REPORT or FEATURE REQUEST?
FEATURE REQUEST
Versions
kubeadm version (use kubeadm version
): v1.29.1
Environment:
- Kubernetes version (use
kubectl version
): v1.29.1 - Cloud provider or hardware configuration: bare-metal
- OS (e.g. from /etc/os-release): Ubuntu 22.04.1 LTS
- Kernel (e.g.
uname -a
): 5.15.0-50-generic - Container runtime (CRI) (e.g. containerd, cri-o): containerd=1.6.12-0ubuntu1~22.04.3
- Container networking plugin (CNI) (e.g. Calico, Cilium): calico
- Others:
What happened?
If kubeadm init
node on Ubuntu 20.04 and kubeadm join
node on RHEL9, the joining fails with "open /run/systemd/resolve/resolv.conf: no such file or directory" in kubelet logs.
W/A: use patches or delete resolvConf from kubelet-config ConfigMap before joining.
What you expected to happen?
kubeadm init
does not write default resolvConf in KubeletConfiguration kubelet-config ConfigMap. Instead, resolvConf is omitted in kubelet-config ConfigMap, and real value in /var/lib/kubelet/config.yaml is calculated dynamically depending on if systemd-resolved service is active.
How to reproduce it (as minimally and precisely as possible)?
See What happened?.
Anything else we need to know?
kubeadm init does not write default resolvConf in KubeletConfiguration. Instead, resolvConf is omitted, and real value in kubelet config.yaml in calculated dynamically depending on if systemd-resolved service is active.
this is intended.
kubeadm will only update the KubeletConfiguration.ResolverConfig
field if the systemd-resolved
service is active:
https://github.com/kubernetes/kubernetes/blob/master/cmd/kubeadm/app/componentconfigs/kubelet.go#L200-L213
/run/systemd/resolve/resolv.conf
is a valid path if systemd-resolved
is managing resolv.conf
.
$ ls -l /run/systemd/resolve/resolv.conf
-rw-r--r-- 1 systemd-resolve systemd-resolve 786 Mar 4 15:24 /run/systemd/resolve/resolv.conf
$ systemctl status systemd-resolved | grep active
Active: active (running) since Mon 2024-03-04 15:24:27 EET; 1min 17s ago
if the service is active but the file is missing, then that problem must be fixed.
if the service is not active the kubelet will default the field to /etc/resolv.conf
:
https://github.com/kubernetes/kubelet/blob/master/config/v1beta1/types.go#L437-L442
/run/systemd/resolve/resolv.conf
is a valid path ifsystemd-resolved
is managingresolv.conf
Please check the What happened?. If different OS are used, systemd-resolved
is not managing resolv.conf
for RHEL9, but kubelet tries to open the /run/systemd/resolve/resolv.conf
.
again, if the service systemd-resolved is enabled the path passed to kubelet should be /run/systemd/resolve/resolv.conf
.
if that is not correct on a certain distro, then it's a problem with systemd-resovled on that distro, i'd say.
In the mentioned case systemd-resolved is disabled for RHEL9. Let me rephrase the test case:
-
Init first Kubernetes node on Ubuntu 20.04. systemd-resolved is active.
Actual: Kubeadm writes
resolvConf: /run/systemd/resolve/resolv.conf
in bothkubelet-config
ConfigMap and in/var/lib/kubelet/config.yaml
.Proposed: Kubeadm writes
resolvConf: /run/systemd/resolve/resolv.conf
only in/var/lib/kubelet/config.yaml
, but omits theresolvConf
property inkubelet-config
ConfigMap. -
Join second Kubernetes node on RHEL9. systemd-resolved is inactive.
Actual: Kubeadm writes
resolvConf: /run/systemd/resolve/resolv.conf
in/var/lib/kubelet/config.yaml
using thekubelet-config
ConfigMap and kubelet fails.Proposed: Since the property is absent in the ConfigMap at step 1, Kubeadm uses the default
/etc/resolv.conf
in/var/lib/kubelet/config.yaml
.
ok, now i understand the problem. this was not clear in your description.
so first of all, most of the users use the same distro or distro family for a single cluster, so kubeadm is correct for these users. over there systemd-resolved is really enabled or not.
if some node does not work with the default kubeletconfiguration then patches should be used. that is the correct solution.
what can be done to make kubeadm better here is to:
- don't write any defaults in the kubeletconfiguration about
resolvConf
(move this logic to 2)
https://github.com/kubernetes/kubernetes/blob/master/cmd/kubeadm/app/componentconfigs/kubelet.go#L200-L213 - mutate the kubelet configuration for a given node after it's downloaded:
https://github.com/kubernetes/kubernetes/blob/master/cmd/kubeadm/app/phases/kubelet/config.go#L49
we are close to code freeze for 1.30. this can be changed for 1.31, but i don't think it should be backported.
we also need to understand if it's going to break existing users in some way.
PRs welcome, explained above:
#3034 (comment)