kubernetes / kubernetes

Production-Grade Container Scheduling and Management

Home Page:https://kubernetes.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

bug: kubelet panic & crash if `--config-dir` is used

hegerdes opened this issue · comments

What happened?

I created a v1.30.0 k8s cluster with kubeadm and created a drop-in directory for kubelet configuration under /etc/kubernetes/kubelet.conf.d. The normal conf file created by kubeadm is in /var/lib/kubelet/config.yaml

According to the docs the kubelet should merge all configs in specified order and start. But the kubelet crashes under two setups:

Setup 1:

The /etc/kubernetes/kubelet.conf.d/20-kubelet.conf is completely empty the kubelet crashes with:

>kubelet --config-dir /etc/kubernetes/kubelet.conf.d --config /var/lib/kubelet/config.yaml
E0510 13:47:28.749138    9050 run.go:74] "command failed" err="failed to merge kubelet configs: failed to walk through kubelet dropin directory \"/etc/kubernetes/kubelet.conf.d\": failed to load kubelet dropin file, path: /etc/kubernetes/kubelet.conf.d/20-kubelet.conf, error: kubelet config file \"/etc/kubernetes/kubelet.conf.d/20-kubelet.conf\" was empty"

If there is just one space or an newline in that file it crashes with:

>kubelet --config-dir /etc/kubernetes/kubelet.conf.d --config /var/lib/kubelet/config.yaml
E0510 13:49:26.243517    9062 run.go:74] "command failed" err=<
        failed to merge kubelet configs: failed to walk through kubelet dropin directory "/etc/kubernetes/kubelet.conf.d": failed to load kubelet dropin file, path: /etc/kubernetes/kubelet.conf.d/20-kubelet.conf, error: Object 'Kind' is missing in '
        '

Even when the docs say:

These files may contain partial configurations and might not be valid config files by themselves. Validation is only performed on the final resulting configuration structure stored internally in the kubelet.

If there is nothing or only whitspaces in /etc/kubernetes/kubelet.conf.d the merged config should not be invalid and not every file in /etc/kubernetes/kubelet.conf.d must contain a kind property.

Setup 2:
The /etc/kubernetes/kubelet.conf.d/20-kubelet.conf is the same as the /var/lib/kubelet/config.yaml the kubelet panics and crashes with:

> kubelet --config-dir /etc/kubernetes/kubelet.conf.d --config /var/lib/kubelet/config.yaml
panic: non-positive interval for NewTicker

goroutine 1 [running]:
time.NewTicker(0xc00098f7a0?)
        time/tick.go:22 +0xe5
k8s.io/klog/v2/internal/clock.RealClock.NewTicker(...)
        k8s.io/klog/v2@v2.120.1/internal/clock/clock.go:111
k8s.io/klog/v2.(*flushDaemon).run(0xc000229650, 0x0)
        k8s.io/klog/v2@v2.120.1/klog.go:1169 +0x10f
k8s.io/klog/v2.StartFlushDaemon(0x0)
        k8s.io/klog/v2@v2.120.1/klog.go:1220 +0x36
k8s.io/component-base/logs/api/v1.apply(0xc0008c8a38, 0x0, {0x7f1ede53b2f0, 0xc000557130})
        k8s.io/component-base/logs/api/v1/options.go:285 +0x890
k8s.io/component-base/logs/api/v1.validateAndApply(0xc0008c8a38, 0x0, {0x7f1ede53b2f0, 0xc000557130}, 0x4?)
        k8s.io/component-base/logs/api/v1/options.go:132 +0x71
k8s.io/component-base/logs/api/v1.ValidateAndApplyAsField(...)
        k8s.io/component-base/logs/api/v1/options.go:124
k8s.io/kubernetes/cmd/kubelet/app.NewKubeletCommand.func1(0xc0000d8908, {0xc000100060, 0x4, 0x4})
        k8s.io/kubernetes/cmd/kubelet/app/server.go:238 +0x4bc
github.com/spf13/cobra.(*Command).execute(0xc0000d8908, {0xc000100060, 0x4, 0x4})
        github.com/spf13/cobra@v1.7.0/command.go:940 +0x882
github.com/spf13/cobra.(*Command).ExecuteC(0xc0000d8908)
        github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5
github.com/spf13/cobra.(*Command).Execute(...)
        github.com/spf13/cobra@v1.7.0/command.go:992
k8s.io/component-base/cli.run(0xc0000d8908)
        k8s.io/component-base/cli/run.go:146 +0x290
k8s.io/component-base/cli.Run(0xc0000061c0?)
        k8s.io/component-base/cli/run.go:46 +0x17
main.main()
        k8s.io/kubernetes/cmd/kubelet/kubelet.go:36 +0x18

Using the same config only via --config works.

What did you expect to happen?

The kubelet should start with empty files in /etc/kubernetes/kubelet.conf.d or if the files in /var/lib/kubelet/config.yaml and /etc/kubernetes/kubelet.conf.d are the same.

If this is not the expected usage, the docs have to be adjusted.

How can we reproduce it (as minimally and precisely as possible)?

Create a cluster with kubeadm v1.30 with kublet config in /var/lib/kubelet/config.yaml and /etc/kubernetes/kubelet.conf.d or setup kubelet by hand.

# kubeadm config
apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration

nodeRegistration:
  kubeletExtraArgs:
    config: /var/lib/kubelet/config.yaml
    config-dir: /etc/kubernetes/kubelet.conf.d

Anything else we need to know?

The kublet conf api version is apiVersion: kubelet.config.k8s.io/v1beta1 and is was with and without setting KUBELET_CONFIG_DROPIN_DIR_ALPHA

Functionality was introduced in #119390

Kubernetes version

$ kubectl version
Client Version: v1.30.0
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
$ kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"30", GitVersion:"v1.30.0", GitCommit:"7c48c2bd72b9bf5c44d21d7338cc7bea77d0ad2a", GitTreeState:"clean", BuildDate:"2024-04-17T17:34:08Z", GoVersion:"go1.22.2", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider

Hetzner/None

OS version

# On Linux:
$ cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
$ uname -a
Linux controlplane-node-amd64-vcxkzu 6.1.0-21-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.90-1 (2024-05-03) x86_64 GNU/Linux

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

kubeadm

Container runtime (CRI) and version (if applicable)

containerd github.com/containerd/containerd v1.7.16

Related plugins (CNI, CSI, ...) and versions (if applicable)

not relevant

/sig node

I think it is a doc issue for step 1.

From KEP, it said,

If there are any issues with the drop-ins (e.g. formatting errors), the error will be reported in the same way as a misconfigured kubelet.conf file

I think it is a doc issue for step 1.

From KEP, it said,

If there are any issues with the drop-ins (e.g. formatting errors), the error will be reported in the same way as a misconfigured kubelet.conf file

Just to be sure I recreated the setup given in the KEP. It did not work. Every file in the drop-ins seems to be required to have a kind (which suggest the validation is not just done at the end) and empty files produce an error.

There seems to be something wrong while merging the files.

yeah the documentation doesn't match the behavior. I think we need to keep the behavior this way, where the kubelet fails on any invalid configuration file, because an admin's intent may not be fulfilled in the resulting configuration if we skip the invalid configs and continue the config resolution without taking the funky one into account. @sohankunkerkar do you have bandwidth to update the docs?

yeah the documentation doesn't match the behavior. I think we need to keep the behavior this way, where the kubelet fails on any invalid configuration file, because an admin's intent may not be fulfilled in the resulting configuration if we skip the invalid configs and continue the config resolution without taking the funky one into account. @sohankunkerkar do you have bandwidth to update the docs?

+1 to what @haircommander said. Let me go ahead and update the docs.

I think in the case/step 2 described above the kubelet should preferably abort like it does in the case/step1 rather than crash with a stacktrace. Perhaps a stretch goal?