gardener / gardener-extension-provider-azure

Gardener extension controller for the Azure cloud provider (https://azure.microsoft.com).

Home Page:https://gardener.cloud

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

`cloud-provider-config` Secret is not updated on Shoot deletion -> deadlock on Shoot deletion

ialidzhikov opened this issue · comments

How to categorize this issue?

/area control-plane
/kind bug
/platform azure

What happened:
The cloud-provider-config Secret holds the Azure credentials for cloud-controller-manager. Currently this Secret is updated/created only on ControlPlane reconciliation.

// Get config chart values
if a.configChart != nil {
values, err := a.vp.GetConfigChartValues(ctx, cp, cluster)
if err != nil {
return false, err
}
// Apply config chart
log.Info("Applying configuration chart")
if err := a.configChart.Apply(ctx, a.chartApplier, cp.Namespace, nil, "", "", values); err != nil {
return false, fmt.Errorf("could not apply configuration chart for controlplane '%s': %w", kutil.ObjectName(cp), err)
}
}

There is the following deadlock situation for a deletion of hibernated Shoot.

  1. Shoot with invalid credentials gets deleted.

  2. As the Shoot is hibernated, the deletions fails to destroy the ControlPlane with reason:

    task "Waiting until shoot control plane has been destroyed" failed: Failed to delete ControlPlane shoot--foo--test/test: Error deleting ControlPlane: error while waiting for managed resource containing shoot chart for controlplane 'shoot--foo--test/test' to be deleted: error while waiting for all resources to be deleted: retry failed with context deadline exceeded, last error: resource shoot--foo--test/extension-controlplane-shoot still exists:
    Could not clean all old resources: 2 errors occurred: [deletion of old resource "v1/Service/kube-system/allow-tcp-egress" is still pending, deletion of old resource "v1/Service/kube-system/allow-udp-egress" is still pending]
    

    CCM is CrashLoopBackOff due to invalid credentials, hence cannot deleted the allow-tcp-egress and allow-udp-egress Services.

  3. Shoot owner updates the credentials with valid ones.

  4. The deletion continues to fail with the error from step 2.

    The cloud-provider-config Secret never gets updated.

What you expected to happen:
Deletion of hibernated Shoot to succeed once the credentials are updated with valid ones.

How to reproduce it (as minimally and precisely as possible):
See above.

Anything else we need to know?:
N/A

Environment:

  • Gardener version (if relevant): v1.32.0
  • Extension version:
  • Kubernetes version (use kubectl version):
  • Cloud provider or hardware configuration:
  • Others: