bug: kubelet panic & crash if `--config-dir` is used
hegerdes opened this issue · comments
What happened?
I created a v1.30.0 k8s cluster with kubeadm and created a drop-in directory for kubelet configuration under /etc/kubernetes/kubelet.conf.d
. The normal conf file created by kubeadm is in /var/lib/kubelet/config.yaml
According to the docs the kubelet should merge all configs in specified order and start. But the kubelet crashes under two setups:
Setup 1:
The /etc/kubernetes/kubelet.conf.d/20-kubelet.conf
is completely empty the kubelet crashes with:
>kubelet --config-dir /etc/kubernetes/kubelet.conf.d --config /var/lib/kubelet/config.yaml
E0510 13:47:28.749138 9050 run.go:74] "command failed" err="failed to merge kubelet configs: failed to walk through kubelet dropin directory \"/etc/kubernetes/kubelet.conf.d\": failed to load kubelet dropin file, path: /etc/kubernetes/kubelet.conf.d/20-kubelet.conf, error: kubelet config file \"/etc/kubernetes/kubelet.conf.d/20-kubelet.conf\" was empty"
If there is just one space or an newline in that file it crashes with:
>kubelet --config-dir /etc/kubernetes/kubelet.conf.d --config /var/lib/kubelet/config.yaml
E0510 13:49:26.243517 9062 run.go:74] "command failed" err=<
failed to merge kubelet configs: failed to walk through kubelet dropin directory "/etc/kubernetes/kubelet.conf.d": failed to load kubelet dropin file, path: /etc/kubernetes/kubelet.conf.d/20-kubelet.conf, error: Object 'Kind' is missing in '
'
Even when the docs say:
These files may contain partial configurations and might not be valid config files by themselves. Validation is only performed on the final resulting configuration structure stored internally in the kubelet.
If there is nothing or only whitspaces in /etc/kubernetes/kubelet.conf.d
the merged config should not be invalid and not every file in /etc/kubernetes/kubelet.conf.d
must contain a kind
property.
Setup 2:
The /etc/kubernetes/kubelet.conf.d/20-kubelet.conf
is the same as the /var/lib/kubelet/config.yaml
the kubelet panics and crashes with:
> kubelet --config-dir /etc/kubernetes/kubelet.conf.d --config /var/lib/kubelet/config.yaml
panic: non-positive interval for NewTicker
goroutine 1 [running]:
time.NewTicker(0xc00098f7a0?)
time/tick.go:22 +0xe5
k8s.io/klog/v2/internal/clock.RealClock.NewTicker(...)
k8s.io/klog/v2@v2.120.1/internal/clock/clock.go:111
k8s.io/klog/v2.(*flushDaemon).run(0xc000229650, 0x0)
k8s.io/klog/v2@v2.120.1/klog.go:1169 +0x10f
k8s.io/klog/v2.StartFlushDaemon(0x0)
k8s.io/klog/v2@v2.120.1/klog.go:1220 +0x36
k8s.io/component-base/logs/api/v1.apply(0xc0008c8a38, 0x0, {0x7f1ede53b2f0, 0xc000557130})
k8s.io/component-base/logs/api/v1/options.go:285 +0x890
k8s.io/component-base/logs/api/v1.validateAndApply(0xc0008c8a38, 0x0, {0x7f1ede53b2f0, 0xc000557130}, 0x4?)
k8s.io/component-base/logs/api/v1/options.go:132 +0x71
k8s.io/component-base/logs/api/v1.ValidateAndApplyAsField(...)
k8s.io/component-base/logs/api/v1/options.go:124
k8s.io/kubernetes/cmd/kubelet/app.NewKubeletCommand.func1(0xc0000d8908, {0xc000100060, 0x4, 0x4})
k8s.io/kubernetes/cmd/kubelet/app/server.go:238 +0x4bc
github.com/spf13/cobra.(*Command).execute(0xc0000d8908, {0xc000100060, 0x4, 0x4})
github.com/spf13/cobra@v1.7.0/command.go:940 +0x882
github.com/spf13/cobra.(*Command).ExecuteC(0xc0000d8908)
github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5
github.com/spf13/cobra.(*Command).Execute(...)
github.com/spf13/cobra@v1.7.0/command.go:992
k8s.io/component-base/cli.run(0xc0000d8908)
k8s.io/component-base/cli/run.go:146 +0x290
k8s.io/component-base/cli.Run(0xc0000061c0?)
k8s.io/component-base/cli/run.go:46 +0x17
main.main()
k8s.io/kubernetes/cmd/kubelet/kubelet.go:36 +0x18
Using the same config only via --config
works.
What did you expect to happen?
The kubelet should start with empty files in /etc/kubernetes/kubelet.conf.d
or if the files in /var/lib/kubelet/config.yaml
and /etc/kubernetes/kubelet.conf.d
are the same.
If this is not the expected usage, the docs have to be adjusted.
How can we reproduce it (as minimally and precisely as possible)?
Create a cluster with kubeadm v1.30 with kublet config in /var/lib/kubelet/config.yaml
and /etc/kubernetes/kubelet.conf.d
or setup kubelet by hand.
# kubeadm config
apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration
nodeRegistration:
kubeletExtraArgs:
config: /var/lib/kubelet/config.yaml
config-dir: /etc/kubernetes/kubelet.conf.d
Anything else we need to know?
The kublet conf api version is apiVersion: kubelet.config.k8s.io/v1beta1
and is was with and without setting KUBELET_CONFIG_DROPIN_DIR_ALPHA
Functionality was introduced in #119390
Kubernetes version
$ kubectl version
Client Version: v1.30.0
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
$ kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"30", GitVersion:"v1.30.0", GitCommit:"7c48c2bd72b9bf5c44d21d7338cc7bea77d0ad2a", GitTreeState:"clean", BuildDate:"2024-04-17T17:34:08Z", GoVersion:"go1.22.2", Compiler:"gc", Platform:"linux/amd64"}
Cloud provider
OS version
# On Linux:
$ cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
$ uname -a
Linux controlplane-node-amd64-vcxkzu 6.1.0-21-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.90-1 (2024-05-03) x86_64 GNU/Linux
# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here
Install tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)
/sig node
I think it is a doc issue for step 1
.
From KEP, it said,
If there are any issues with the drop-ins (e.g. formatting errors), the error will be reported in the same way as a misconfigured kubelet.conf file
I think it is a doc issue for
step 1
.From KEP, it said,
If there are any issues with the drop-ins (e.g. formatting errors), the error will be reported in the same way as a misconfigured kubelet.conf file
Just to be sure I recreated the setup given in the KEP. It did not work. Every file in the drop-ins seems to be required to have a kind
(which suggest the validation is not just done at the end) and empty files produce an error.
There seems to be something wrong while merging the files.
yeah the documentation doesn't match the behavior. I think we need to keep the behavior this way, where the kubelet fails on any invalid configuration file, because an admin's intent may not be fulfilled in the resulting configuration if we skip the invalid configs and continue the config resolution without taking the funky one into account. @sohankunkerkar do you have bandwidth to update the docs?
yeah the documentation doesn't match the behavior. I think we need to keep the behavior this way, where the kubelet fails on any invalid configuration file, because an admin's intent may not be fulfilled in the resulting configuration if we skip the invalid configs and continue the config resolution without taking the funky one into account. @sohankunkerkar do you have bandwidth to update the docs?
+1 to what @haircommander said. Let me go ahead and update the docs.
/triage accepted
per #124796 (comment) and #124796 (comment)
I think in the case/step 2 described above the kubelet should preferably abort like it does in the case/step1 rather than crash with a stacktrace. Perhaps a stretch goal?