fluxcd / helm-controller

The GitOps Toolkit Helm reconciler, for declarative Helming

Home Page:https://fluxcd.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Metallb installation with `driftDetection: mode: enabled` failed to apply revision

zaggash opened this issue · comments

I'm trying to setup Metallb with this Kustomization:

apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
  name: apps-metallb
  namespace: flux-system
spec:
  path: /apps/metallb-system/metallb/app
  sourceRef:
    kind: GitRepository
    name: apps
    healthChecks:
      - apiVersion: helm.toolkit.fluxcd.io/v2beta2
        kind: HelmRelease
        name: metallb
        namespace: metallb-system
  interval: 30m
  retryInterval: 1m
  timeout: 3m

And this Helm Release:

---
apiVersion: helm.toolkit.fluxcd.io/v2beta2
kind: HelmRelease
metadata:
  name: metallb
spec:
  interval: 15m
  driftDetection:
    mode: enabled
  chart:
    spec:
      chart: metallb
      version: "0.13.12"
      sourceRef:
        kind: HelmRepository
        name: metallb-charts
        namespace: flux-system
  maxHistory: 3
  install:
    createNamespace: true
    crds: CreateReplace
    remediation:
      retries: 3
  upgrade:
    cleanupOnFail: true
    crds: CreateReplace
    remediation:
      retries: 3
  uninstall:
    keepHistory: false
  values:
    controller:
      logLevel: warn
    speaker:
      logLevel: warn
    frr:
      enabled: false

flux version

flux: v2.2.0
distribution: flux-v2.2.1
helm-controller: v0.37.1
image-automation-controller: v0.37.0
image-reflector-controller: v0.31.1
kustomize-controller: v1.2.1
notification-controller: v1.2.3
source-controller: v1.2.3

flux get kustomizations is showing it is never Ready and marked as Unkown.

In the logs of the helm-controller I have

k -n flux-system logs helm-controller-<id>

{"level":"debug","ts":"2023-12-18T20:53:52.682Z","logger":"events","msg":"Cluster state of release metallb-system/metallb.v1 has drifted from the desired state:\nCustomResourceDefinition/addresspools.metallb.io changed (0 additions, 1 changes, 0 removals)\nCustomResourceDefinition/bgppeers.metallb.io changed (0 additions, 1 changes, 0 removals)","type":"Warning","object":{"kind":"HelmRelease","namespace":"metallb-system","name":"metallb","uid":"59eb65b4-d800-4f6e-96af-59891565efc6","apiVersion":"helm.toolkit.fluxcd.io/v2beta2","resourceVersion":"181220417"},"reason":"DriftDetected"}
{"level":"debug","ts":"2023-12-18T20:53:52.683Z","msg":"instructed to stop before running drift correction action reconciler correct cluster drift","controller":"helmrelease","controllerGroup":"helm.toolkit.fluxcd.io","controllerKind":"HelmRelease","HelmRelease":{"name":"metallb","namespace":"metallb-system"},"namespace":"metallb-system","name":"metallb","reconcileID":"8cc34888-d956-4d75-93d4-87c10f99a24e"}

The application is successfully installed, the pods are Ready, the HelmRelease is marked as Ready.
However, the Kustomization never finish.
It continuously try to reconcile the HelmRelease for some reasons.

I tried many times to manually reconcile, tried with --with-source.
I tried to remove the HealthCheck and set it to wait: true, nothing is working.

The only way to make it work is to remove every HealthCheck or wait:true statement and it is then successfully deployed.

Can you please share the .status and the events for the HelmRelease object? It appears to me like the controller is observing continued drift for the release, and you should e.g. make use of ignore rules to exclude certain fields.

The precise fields can be observable from the controllers logs, they should be logged as resource modified messages at debug level with a patch field attached to them.

Please see the .status of the HR

status:
  conditions:
  - lastTransitionTime: "2023-12-19T11:22:43Z"
    message: Helm install succeeded for release metallb-system/metallb.v1 with chart
      metallb@0.13.12
    observedGeneration: 4
    reason: ProgressingWithRetry
    status: "True"
    type: Reconciling
  - lastTransitionTime: "2023-12-18T20:11:17Z"
    message: Helm install succeeded for release metallb-system/metallb.v1 with chart
      metallb@0.13.12
    observedGeneration: 1
    reason: InstallSucceeded
    status: "True"
    type: Ready
  - lastTransitionTime: "2023-12-18T20:11:17Z"
    message: Helm install succeeded for release metallb-system/metallb.v1 with chart
      metallb@0.13.12
    observedGeneration: 1
    reason: InstallSucceeded
    status: "True"
    type: Released
  helmChart: flux-system/metallb-system-metallb
  history:
  - chartName: metallb
    chartVersion: 0.13.12
    configDigest: sha256:cabfeb21c57b8b06565689d2212cdfb278c61ce442822337215254a84a4850d9
    digest: sha256:e524142b85ae05a16d30ba30962e2a175d6381995bc71d463a97794211a15c98
    firstDeployed: "2023-12-18T20:11:04Z"
    lastDeployed: "2023-12-18T20:11:04Z"
    name: metallb
    namespace: metallb-system
    status: deployed
    version: 1
  lastAttemptedConfigDigest: sha256:cabfeb21c57b8b06565689d2212cdfb278c61ce442822337215254a84a4850d9
  lastAttemptedGeneration: 4
  lastAttemptedReleaseAction: install
  lastAttemptedRevision: 0.13.12
  lastHandledReconcileAt: "2023-12-18T21:53:52.159942829+01:00"
  lastHandledResetAt: "2023-12-18T21:53:52.159942829+01:00"
  observedGeneration: -1
  storageNamespace: metallb-system

Trying to set the controller log to debug.

How can I extract this patch from the controller logs ?
I can't see anything when I look a the debug logs.

It should be logged right after "detected changes in cluster state", see:

https://github.com/fluxcd/helm-controller/blob/main/internal/reconcile/atomic_release.go#L377-L387

Without knowing the specific path, you should at least be able to confirm the issue is indeed due to detected drift by excluding the resource in full.

This is what I found in my logs.

2023-12-20T00:05:26.960Z debug  - Cluster state of release metallb-system/metallb.v1 has drifted from the desired state:
CustomResourceDefinition/addresspools.metallb.io changed (0 additions, 1 changes, 0 removals)
CustomResourceDefinition/bgppeers.metallb.io changed (0 additions, 1 changes, 0 removals) 
2023-12-20T00:05:26.961Z debug HelmRelease/metallb.metallb-system - instructed to stop before running drift correction action reconciler correct cluster drift 
2023-12-20T00:08:38.996Z info HelmRelease/metallb.metallb-system - HelmChart/flux-system/metallb-system-metallb with SourceRef 'HelmRepository/flux-system/metallb-charts' is in-sync 
2023-12-20T00:08:39.041Z debug HelmRelease/metallb.metallb-system - determining current state of Helm release 
2023-12-20T00:08:39.280Z debug HelmRelease/metallb.metallb-system - determining next Helm action based on current state 
2023-12-20T00:08:39.280Z info HelmRelease/metallb.metallb-system - detected changes in cluster state: removed: 0, changed: 2, excluded: 0 
2023-12-20T00:08:39.280Z debug HelmRelease/metallb.metallb-system - resource modified 
2023-12-20T00:08:39.280Z debug HelmRelease/metallb.metallb-system - resource modified 
2023-12-20T00:08:39.280Z debug  - Cluster state of release metallb-system/metallb.v1 has drifted from the desired state:
CustomResourceDefinition/addresspools.metallb.io changed (0 additions, 1 changes, 0 removals)
CustomResourceDefinition/bgppeers.metallb.io changed (0 additions, 1 changes, 0 removals) 
2023-12-20T00:08:39.296Z info HelmRelease/metallb.metallb-system - running 'correct cluster drift' action with timeout of 5m0s 
2023-12-20T00:08:39.318Z debug  - Cluster state of release metallb-system/metallb.v1 has been corrected:
CustomResourceDefinition/addresspools.metallb.io configured
CustomResourceDefinition/bgppeers.metallb.io configured 
2023-12-20T00:08:39.319Z debug HelmRelease/metallb.metallb-system - determining current state of Helm release 
2023-12-20T00:08:39.541Z debug HelmRelease/metallb.metallb-system - determining next Helm action based on current state 
2023-12-20T00:08:39.541Z info HelmRelease/metallb.metallb-system - detected changes in cluster state: removed: 0, changed: 2, excluded: 0 
2023-12-20T00:08:39.541Z debug HelmRelease/metallb.metallb-system - resource modified 
2023-12-20T00:08:39.541Z debug HelmRelease/metallb.metallb-system - resource modified 
2023-12-20T00:08:39.541Z debug  - Cluster state of release metallb-system/metallb.v1 has drifted from the desired state:
CustomResourceDefinition/addresspools.metallb.io changed (0 additions, 1 changes, 0 removals)
CustomResourceDefinition/bgppeers.metallb.io changed (0 additions, 1 changes, 0 removals) 
2023-12-20T00:08:39.542Z debug HelmRelease/metallb.metallb-system - instructed to stop before running drift correction action reconciler correct cluster drif

IIRC, I need to ignore both CRDs
CustomResourceDefinition/addresspools.metallb.io
CustomResourceDefinition/bgppeers.metallb.io

For some reasons these are changed after Helm installs it, right.

We are experiencing this problem as well.
But on top of this, we also see the wrong status for the MetalLB HelmRelease (MetalLB is only an example here I guess).

We see dependency 'monitoring/xx' is not ready as status like so:

status:
  conditions:
  - lastTransitionTime: "2024-01-12T11:10:18Z"
    message: dependency 'monitoring/xx' is not ready
    observedGeneration: 17
    reason: ProgressingWithRetry
    status: "True"
    type: Reconciling
  - lastTransitionTime: "2024-01-11T10:06:53Z"
    message: dependency 'monitoring/xx' is not ready
    observedGeneration: 3
    reason: DependencyNotReady
    status: "False"
    type: Ready
  - lastTransitionTime: "2024-01-09T16:54:05Z"
    message: Helm install succeeded for release metallb/metallb.v1 with chart metallb@0.13.12
    observedGeneration: 1
    reason: InstallSucceeded
    status: "True"

But in the helm-controller logs we see

{"level":"info","ts":"2024-01-12T11:03:47.753Z","msg":"checking 1 dependencies","controller":"helmrelease","controllerGroup":"helm.toolkit.fluxcd.io","controllerKind":"HelmRelease","HelmRelease":{"name":"metallb","namespace":"metallb"},"namespace":"metallb","name":"metallb","reconcileID":"d7751623-4737-417b-82cc-d1db35fdbfe7"}
{"level":"info","ts":"2024-01-12T11:03:47.753Z","msg":"all dependencies are ready","controller":"helmrelease","controllerGroup":"helm.toolkit.fluxcd.io","controllerKind":"HelmRelease","HelmRelease":{"name":"metallb","namespace":"metallb"},"namespace":"metallb","name":"metallb","reconcileID":"d7751623-4737-417b-82cc-d1db35fdbfe7"}
{"level":"info","ts":"2024-01-12T11:03:48.117Z","msg":"detected changes in cluster state: removed: 0, changed: 2, excluded: 0","controller":"helmrelease","controllerGroup":"helm.toolkit.fluxcd.io","controllerKind":"HelmRelease","HelmRelease":{"name":"metallb","namespace":"metallb"},"namespace":"metallb","name":"metallb","reconcileID":"d7751623-4737-417b-82cc-d1db35fdbfe7"}
{"level":"info","ts":"2024-01-12T11:03:48.163Z","msg":"running 'correct cluster drift' action with timeout of 5m0s","controller":"helmrelease","controllerGroup":"helm.toolkit.fluxcd.io","controllerKind":"HelmRelease","HelmRelease":{"name":"metallb","namespace":"metallb"},"namespace":"metallb","name":"metallb","reconcileID":"d7751623-4737-417b-82cc-d1db35fdbfe7"}
{"level":"info","ts":"2024-01-12T11:03:48.584Z","msg":"detected changes in cluster state: removed: 0, changed: 2, excluded: 0","controller":"helmrelease","controllerGroup":"helm.toolkit.fluxcd.io","controllerKind":"HelmRelease","HelmRelease":{"name":"metallb","namespace":"metallb"},"namespace":"metallb","name":"metallb","reconcileID":"d7751623-4737-417b-82cc-d1db35fdbfe7"}

--> "msg":"all dependencies are ready"

Setting the log-level to debug showed us the path for the (automatically) changed data.
We added the following to the MetalLB HelmRelease and it fixed the reconciliation.

  driftDetection:
    ignore:
    - paths:
      - /spec/conversion/webhook/clientConfig/caBundle
      target:
        kind: CustomResourceDefinition

Still I think the status message has to be fixed...because it seems to not change, if the HelmRelease goes into "ProgressingWithRetry" - it just keeps the status message from before, is my guess (without looking into the code).