kubernetes-sigs / controller-runtime

Repo for the controller-runtime subproject of kubebuilder (sig-apimachinery)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

When a non-leader shuts down, leader election runnables are started

pleshakov opened this issue · comments

Hi folks,

Noticed the following behavior.

When a non-leader manager shuts down, it starts leader election runnables.

In our project, we see the following in the logs

(this manager is a non-leader)
I0320 19:27:59.274077       7 leaderelection.go:250] attempting to acquire leader lease nginx-gateway/ngf-nginx-gateway-fabric-leader-election...


{"level":"info","ts":"2024-03-20T19:28:58Z","msg":"Stopping and waiting for leader election runnables"}

(Next 5 lines are from the jobs started by leader election runnables)

{"level":"info","ts":"2024-03-20T19:28:58Z","logger":"statusUpdater","msg":"Writing last statuses"}
{"level":"info","ts":"2024-03-20T19:28:58Z","logger":"statusUpdater","msg":"Updating Gateway API statuses"}
{"level":"info","ts":"2024-03-20T19:28:58Z","logger":"statusUpdater","msg":"Updating Nginx Gateway status"}
{"level":"info","ts":"2024-03-20T19:28:58Z","logger":"telemetryJob","msg":"Starting cronjob"}
{"level":"info","ts":"2024-03-20T19:28:58Z","logger":"telemetryJob","msg":"Stopping cronjob"}

{"level":"info","ts":"2024-03-20T19:28:58Z","msg":"Stopping and waiting for caches"}
{"level":"info","ts":"2024-03-20T19:28:58Z","msg":"Stopping and waiting for webhooks"}
{"level":"info","ts":"2024-03-20T19:28:58Z","msg":"Stopping and waiting for HTTP servers"}
{"level":"info","ts":"2024-03-20T19:28:58Z","logger":"controller-runtime.metrics","msg":"Shutting down metrics server with timeout of 1 minute"}
{"level":"info","ts":"2024-03-20T19:28:58Z","msg":"shutting down server","kind":"health probe","addr":"[::]:8081"}
{"level":"info","ts":"2024-03-20T19:28:58Z","msg":"Wait completed, proceeding to shutdown the manager"}

Note that after the start, the contexts passed to leader election runnables is closed shortly after, but there is enough time for those jobs to run. For example, to send an HTTP request.

Looking at the controller runtime code, this behavior occurs because when a runnable group is being StopAndWait, it always calls r.Start.

My understanding of the code is like this:

  1. cm.runnables.LeaderElection.StopAndWait(cm.shutdownCtx)
    - shuts down leader runnable group
  2. always starts runnable group if they have not been started before.

I wonder if this behavior is considered a bug.

Used version:

sigs.k8s.io/controller-runtime v0.17.2

/cc @vincepri
Definitely not intended

Recently was chatting with @inteon about the same issue, agreed it's not intended behavior; I'll need to take a look next week