kubernetes-sigs / cluster-api

Home for Cluster API, a subproject of sig-cluster-lifecycle

Home Page:https://cluster-api.sigs.k8s.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Allow patching CP template spec version in CC patch

Danil-Grigorev opened this issue · comments

We have a unique problem in Cluster API bootstrap provider RKE2, which lead us to separate versions of RKE2 control plane and worker machines, due to our RKE2 config being an auto generated resource from CP at all times.

This blocks us from using and storing version in the RKE2 config for provisioning control plane machines or any form of defaulting. This requirement poses a need to allow patching cluster class CP template version directly, but this field is marked as preserved, preventing patch changes with jsonPatch.

If patching of CP template k8s version would be allowed in CAPI code, the change would allow us to specify default version for control plane, but allow users to override with newer versions, if the need arises with a custom variable value.

This issue is currently awaiting triage.

If CAPI contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

As per 29th May office hours discussions, this is not an easy change.

The topology controller is designed to be the single point of control of the Kubernetes version across the entire topology (control plane and workers), and the entire code that automates the upgrade sequence is built on this assumption.

Breaking this assumption by allowing patches to take control of the K8S version for the control plane not only could introduce unexpected conditions blocking the upgrade sequence (or in the worst cases leading the cluster to unexpected or unrecoverable states), but it can also lead to failures to other parts of the code base, like the cluster validation web hook or the machine set preflight checks, all of them designed as safeguards to keep the different parts of the systems within version skew policies defined by K8s or by kubeadm.

Considering this, I'm personally -1 to continue with this change.

Thanks @fabriziopandini, I apologize for the ruffness of the change and making a fuss. It turns out to be an issue on our CAPRKE2 provider side due to overly restrictive validation logic. After relaxing it, Cluster version successfully propagated to both CP and workers from a single location (which was the goal all along), so we are progressing with adding ClusterClass support. Your and other maintainers feedback was really helpful in pointing us in the right direction and explaining the reasoning. I'm going to close this PR and the issue as it is no longer needed.

Great to hear!