Allow patching CP template spec version in CC patch

Question

Allow patching CP template spec version in CC patch

Danil-Grigorev opened this issue a month ago · comments

We have a unique problem in Cluster API bootstrap provider RKE2, which lead us to separate versions of RKE2 control plane and worker machines, due to our RKE2 config being an auto generated resource from CP at all times.

This blocks us from using and storing version in the RKE2 config for provisioning control plane machines or any form of defaulting. This requirement poses a need to allow patching cluster class CP template version directly, but this field is marked as preserved, preventing patch changes with jsonPatch.

If patching of CP template k8s version would be allowed in CAPI code, the change would allow us to specify default version for control plane, but allow users to override with newer versions, if the need arises with a custom variable value.

Kubernetes Prow Robot · Answer 1 · Wed May 29 2024 02:10:56 GMT+0800 (China Standard Time)

This issue is currently awaiting triage.

If CAPI contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Fabrizio Pandini · Answer 2 · Thu May 30 2024 17:34:06 GMT+0800 (China Standard Time)

As per 29th May office hours discussions, this is not an easy change.

The topology controller is designed to be the single point of control of the Kubernetes version across the entire topology (control plane and workers), and the entire code that automates the upgrade sequence is built on this assumption.

Breaking this assumption by allowing patches to take control of the K8S version for the control plane not only could introduce unexpected conditions blocking the upgrade sequence (or in the worst cases leading the cluster to unexpected or unrecoverable states), but it can also lead to failures to other parts of the code base, like the cluster validation web hook or the machine set preflight checks, all of them designed as safeguards to keep the different parts of the systems within version skew policies defined by K8s or by kubeadm.

Considering this, I'm personally -1 to continue with this change.

Danil Grigorev · Answer 3 · Thu May 30 2024 19:54:40 GMT+0800 (China Standard Time)

Thanks @fabriziopandini, I apologize for the ruffness of the change and making a fuss. It turns out to be an issue on our CAPRKE2 provider side due to overly restrictive validation logic. After relaxing it, Cluster version successfully propagated to both CP and workers from a single location (which was the goal all along), so we are progressing with adding ClusterClass support. Your and other maintainers feedback was really helpful in pointing us in the right direction and explaining the reasoning. I'm going to close this PR and the issue as it is no longer needed.

Stefan Büringer · Answer 4 · Fri May 31 2024 17:04:37 GMT+0800 (China Standard Time)

Great to hear!