kubeadm: keep dns resourceRequirements on upgrade

Question

kubeadm: keep dns resourceRequirements on upgrade

maxl99 opened this issue 3 months ago · comments

What happened?

After an upgrade with kubeadm the resourceRequirements will be reset.
Currently the memory limit is hardcoded to 170M in the template.

In our big prod cluster (1000 nodes,5k services, 20k pods) coreDNS needs more than 170M.
So we increased the mem limit in the deployment after an outage (all coredns pods got OOM killed)
After kubeadm upgrade the deployment gets updated to 170M
-> Due to that all coredns pods rotate(during startup the probes are sucessful) and will get oomkilled after loading/caching all dns entries

What did you expect to happen?

keep resourceRequests after kubeadm upgrade

How can we reproduce it (as minimally and precisely as possible)?

change deployment (increase resourceRequests)
run kubeadm upgrade

Anything else we need to know?

No response

Kubernetes version

$ kubectl version
# paste output here

Cloud provider

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

kubeadm

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

Kubernetes Prow Robot · Answer 1 · Thu Feb 29 2024 20:25:09 GMT+0800 (China Standard Time)

There are no sig labels on this issue. Please add an appropriate label by using one of the following commands:

/sig <group-name>
/wg <group-name>
/committee <group-name>

Please see the group list for a listing of the SIGs, working groups, and committees available.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Kubernetes Prow Robot · Answer 2 · Thu Feb 29 2024 20:25:10 GMT+0800 (China Standard Time)

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Lubomir I. Ivanov · Answer 3 · Fri Mar 01 2024 02:15:07 GMT+0800 (China Standard Time)

/transfer kubeadm

Lubomir I. Ivanov · Answer 4 · Fri Mar 01 2024 02:23:51 GMT+0800 (China Standard Time)

@maxl99 the unwritten kubeadm policy about coredns is the following, roughly - if you'd like customizations you can deploy your own coredns and manage it. that said we did add the feature to preserve the deployment replica count, so maybe we can also preserve other aspects of the existing deployment. but not such that would block the coredns migrator code (like "image" IIRC).

cc @pacoxu @SataQiu WDYT?

your PR comes 6 days before code freeze for 1.30 so i think it's a problem merging it without discussion:
kubernetes/kubernetes#123586

Lubomir I. Ivanov · Answer 5 · Fri Mar 01 2024 02:24:32 GMT+0800 (China Standard Time)

/remove-kind bug
/kind feature
(not a bug)

maxl99 · Answer 6 · Mon Mar 04 2024 23:55:59 GMT+0800 (China Standard Time)

@neolit123 In my opinion it would be nice to preserve some specs in the deployment/read them from the existing k8s deplyoment. It is really bad to hardcode mem limits like in coredns (170M) without an option to change it.
This caused an outage in our main prod cluster...DNS always has a big impact :D
This would happen again and again with every "kubeadm upgrade"

Lubomir I. Ivanov · Answer 7 · Tue Mar 05 2024 01:09:10 GMT+0800 (China Standard Time)

can we understand better what else can be preserved from the deployment and handle it all in a single pr?

maxl99 · Answer 8 · Tue Mar 05 2024 01:19:29 GMT+0800 (China Standard Time)

In our env we touched only the resourceRequirements and replicas(already handled by kubernetes/kubernetes#85837).
So for me it would be fine to do the same with the resourceRequirements.

But I imagine that there are also some other points for other people.

Shida Qiu · Answer 9 · Tue Mar 05 2024 12:29:45 GMT+0800 (China Standard Time)

that said we did add the feature to preserve the deployment replica count, so maybe we can also preserve other aspects of the existing deployment. but not such that would block the coredns migrator code (like "image" IIRC).

IMO, if we supported every feature(replica、resource、label、annotation ...) this way, it would make the code difficult to maintain.
Perhaps we should support applying custom patches on addons in the v1beta4 API, even during the upgrade process.

Paco Xu · Answer 10 · Tue Mar 05 2024 14:10:27 GMT+0800 (China Standard Time)

Can we support kube-proxy and coredns patches like control plane components patches in https://kubernetes.io/docs/reference/config-api/kubeadm-config.v1beta3/#kubeadm-k8s-io-v1beta3-Patches?

BTW, can we skip dns upgrade like etcd?

      --etcd-upgrade                       Perform the upgrade of etcd. (default true)

Lubomir I. Ivanov · Answer 11 · Tue Mar 05 2024 15:36:02 GMT+0800 (China Standard Time)

Can we support kube-proxy and coredns patches like control plane components patches in https://kubernetes.io/docs/reference/config-api/kubeadm-config.v1beta3/#kubeadm-k8s-io-v1beta3-Patches?
BTW, can we skip dns upgrade like etcd?

kube-proxy is a DS so it's controlled by a single config (which we create on init), arguably doesn't need patches.
the addons are skipped during upgrade if they are not found, we also have e2e for that. so users can use addons with different CM names today.

IMO, if we supported every feature(replica、resource、label、annotation ...) this way, it would make the code difficult to maintain.
Perhaps we should support applying custom patches on addons in the v1beta4 API, even during the upgrade process.

sadly our dns code is already a mess because of the coredns migration integration.
i was thinking that maybe a DeepCopy of the active Deployment + some edits will not be a lot of code.
generally agree that a patch might be cleaner and less code for us to own.

is "coredns" as a patch target OK, or should it be something like "corednsdeployment"?

@maxl99
we are discussing allowing users to patch coredns instead of auto preserving options
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/control-plane-flags/#patches