nolar / kopf

A Python framework to write Kubernetes operators in just a few lines of code

Home Page:https://kopf.readthedocs.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Monitor watchers with Liveness Probe

dheeg opened this issue · comments

Keywords

No response

Problem

From time to time while starting the Kopf operator, one or more CRD watchers fail. In all known cases, the error was related to Kubernetes API errors.

It seems to be an critical startup moment - it will not recover automatically and only a restart of the operator helps.

Is there a way to monitor the status of all expected watchers via @kopf.on.probe()? Alternatively, is there a way to crash Kopf if this happens?

Finalizers will hang once it happened - a scenario I would like to resolve automatically.

Thanks for any sort of help.

One example

Final Exception:

  File "/usr/local/lib/python3.11/site-packages/kopf/_cogs/clients/errors.py", line 150, in check_response
    raise cls(payload, status=response.status) from e
kopf._cogs.clients.errors.APIForbiddenError: ('thing.example.com is forbidden: User "system:serviceaccount:operator:serviceaccount" cannot watch resource "thing" in API group "example.com" at the cluster scope', {'kind': 'Status', 'apiVersion': 'v1', 'metadata': {}, 'status': 'Failure', 'message': 'things.example.com is forbidden: User "system:serviceaccount:operator:serviceaccount" cannot watch resource "things" in API group "example.com" at the cluster scope', 'reason': 'Forbidden', 'details': {'group': 'example.com', 'kind': 'things'}, 'code': 403})