kubernetes-sigs / cluster-api

Home for Cluster API, a subproject of sig-cluster-lifecycle

Home Page:https://cluster-api.sigs.k8s.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ClusterClass observed generation is not set when the provider is not installed first

Danil-Grigorev opened this issue · comments

What steps did you take and what happened?

When a user creates ClusterClass referencing some infrastructure provider, not installed in the cluster yet, like docker provider for example, the ClusterClass never sets observedGeneration. This blocks Clusters referencing a ClusterClass from provisioning.

To fix it, the Cluster and ClusterClass needs to be removed, provider needs to be installed and both CC and Cluster re-created from scratch.

What did you expect to happen?

Eventually reconcile a ClusterClass, once the needed provider is installed, and start regular provisioning.

Cluster API version

v1.4.6, but might be present on latest, need to verify.

Kubernetes version

Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.0", GitCommit:"1b4df30b3cdfeaba6024e81e559a6cd09a089d65", GitTreeState:"clean", BuildDate:"2023-04-11T17:10:18Z", GoVersion:"go1.20.3", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v5.0.1
Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.3", GitCommit:"25b4e43193bcda6c7328a6d147b1fb73a33f1598", GitTreeState:"clean", BuildDate:"2023-06-15T00:36:28Z", GoVersion:"go1.20.5", Compiler:"gc", Platform:"linux/amd64"}

Anything else you would like to add?

No response

Label(s) to be applied

/kind bug
One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels.
/area clusterclass

This issue is currently awaiting triage.

If CAPI contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

To fix it, the Cluster and ClusterClass needs to be removed, provider needs to be installed and both CC and Cluster re-created from scratch.

That is really surprising to me. The ClusterClass controller should reconcile a ClusterClass at least once every 10m.

Can you provide more details how exactly the ClusterClass is stuck and why you have to recreate it to get it to reconcile again?

I will try it again soon and come back with details. I tried to speed up the process by blindly applying annotation on CC to force reconcile, but nothing changed, so I believe even if I didn’t wait 10m, this should have forced the configuration to be reconciled.

Okay, strange. The reconciliation is very straightforward and I'm not aware of anything that would result in this.

I doublechecked the steps and the logs. The cluster was missing kubeadm provider, which was installed later, after clusterclass and cluster creation. It seems some kubeadm related variable templating was failing to resolve, but this time it eventually started creating machines, just as you say, no intervention required. I’ll close this, and will keep an eye on it. Does seem like a fluke, though got it reproduced twice

Thx, sounds good so far! Let us know if you encounter it reproducibly. Would be very interested in getting it fixed if there is something to fix :)