lyft / flinkk8soperator

Kubernetes operator that provides control plane for managing Apache Flink applications

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Update to FlinkApplication in ClusterStarting phase blocks transition to Running phase

anekdoti opened this issue · comments

When a FlinkApplication custom resource is added, the controller of the flink-operator creates job- and taskmanager deployments. These are labelled with a hash value flink-hash, which is computed from the FlinkApplication, including all its annotations and labels.

During the lifetime of a FlinkApplication such annotations are sometimes added by other operators. A typical example is the helm-operator (https://github.com/fluxcd/helm-operator). When a FlinkApplication is created by the helm-operator on the basis of a HelmRelease referencing a Helm chart, an annotation helm.fluxcd.io/antecedent is added to the FlinkApplication shortly after its creation.

If the FlinkApplication is already in its Running phase, this leads to an update of the Flink cluster, i.e., the Flink cluster is recreated. This seems to be generally fine, but might be unnecessary when the change to the FlinkApplication does not change the properties of the Flink cluster itself.

However, when the update to the FlinkApplication happens while it is still in the ClusterStarting phase, the hash value of the FlinkApplication changes due to the update. As a consequence, the deployments for jobmanager and taskmanagers can not be found as they are still labelled with the original hash value. Therefore, the method IsClusterReady of the controller always returns false, and the FlinkApplication never leaves the ClusterStarting phase. See fluxcd/helm-operator#243 .

Maybe an approach would be to compute the hash value not on the basis of the whole FlinkApplication resource, but from the values that actually should update the cluster.

A related problem is that all annotations of the FlinkApplication are propagated to the jobmanager and taskmanager deployments. For annotations like the one of the helm-operator mentioned above, this is not desirable as the annotation is used to identify resources that are explicitly managed by the helm-operator.