nolar / kopf

A Python framework to write Kubernetes operators in just a few lines of code

Home Page:https://kopf.readthedocs.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Infinite watch-streams stopping immediately with no obvious reasons why. The watched resources do then spin up.

James-Hirst-1998 opened this issue · comments

Long story short

When my cluster is being spun up one of the pods uses the Kopf framework to watch for service monitors to arrive (Prometheus CRDs) and then it should do some further manipulation. In the error case the pod first finds the CRD for the service monitor and starts the watch-stream for the objects but almost immediately stops the watch-stream.

There are no logs as to why it stops, no error code and the pod is marked as healthy. I have confirmed the service monitors do spin up in this unhealthy case so I am unsure why the infinite watch is being terminated so quickly. In a healthy case I have logs where the watch-stream starts and after 30 seconds it sees the first service monitor and then does its further manipulation.

Kopf version

1.36.2

Kubernetes version

1.28.3

Python version

3.8.10

Code

@kopf.on.startup()
def configure(settings, **_):
    """Configure kopf."""
    settings.watching.server_timeout = 60
    settings.watching.client_timeout = 70

@kopf.on.create("servicemonitor")
async def servicemonitor_create_fn(spec, namespace, logger, **_kwargs):
    """Trigger function for a ServiceMonitor being created."""
    logger.debug("A service monitor has been created")

Logs

{"message": "Starting Kopf 1.36.2.", "timestamp": "2024-01-10T18:11:44.808848+00:00", "severity": "debug"} 
{"message": "Activity 'configure' is invoked.", "timestamp": "2024-01-10T18:11:44.809096+00:00", "severity": "debug"} 
{"message": "Activity 'configure' succeeded." "timestamp": "2024-01-10T18:11:44.810125+00:00", "severity": "info"}
{"message": "Initial authentication has been initiated.", "timestamp": "2024-01-10T18:11:44.810619+00:00", "severity": "info"} 
{"message": "Activity 'login_with_service_account' is invoked.", "timestamp": "2024-01-10T18:11:44.810795+00:00", "severity": "debug"}
{"message": "Activity 'login_with_service_account' succeeded.", "timestamp": "2024-01-10T18:11:44.811506+00:00", "severity": "info"}
{"message": "Initial authentication has finished.", "timestamp": "2024-01-10T18:11:44.811631+00:00", "severity": "info"} 
{"message": "Unresolved resources cannot be served (try creating their CRDs): Selector(any_name='servicemonitor')", "timestamp": "2024-01-10T18:11:45.505721+00:00", "severity": "warn"} 
{"message": "Starting the watch-stream for customresourcedefinitions.vl.apiextensions.k8s.io cluster-wide.", "timestamp": "2024-01-10T18:11:45.506829+00:00", "severity": "debug"} 
{"message": "Unresolved resources cannot be served (try creating their CRDs): Selector(any_name='servicemonitor')", "timestamp": "2024-01-10T18:11:57.826303+00:00", "severity": "warn"} 
{"message": "Unresolved resources cannot be served (try creating their CRDs): Selector(any_name='servicemonitor')", "timestamp": "2024-01-10T18:11:57.829433+00:00", "severity": "warn"} 
{"message": "Unresolved resources cannot be served (try creating their CRDs): Selector(any_name='servicemonitor')", "timestamp": "2024-01-10T18:12:02.336769+00:00", "severity": "warn"} 
{"message": "Unresolved resources cannot be served (try creating their CRDs): Selector(any_name='servicemonitor')", "timestamp": "2024-01-10T18:12:02.347011+00:00", "severity": "warn"} 
{"message": "Unresolved resources cannot be served (try creating their CRDs): Selector(any_name='servicemonitor')", "timestamp": "2024-01-10T18:12:42.109591+00:00", "severity": "warn"} 
{"message": "Unresolved resources cannot be served (try creating their CRDs): Selector(any_name='servicemonitor')", "timestamp": "2024-01-10T18:12:42.111244+00:00", "severity": "warn"} 
{"message": "Unresolved resources cannot be served (try creating their CRDs): Selector(any_name='servicemonitor')", "timestamp": "2024-01-10T18:12:42.114258+00:00", "severity": "warn"} 
{"message": "Unresolved resources cannot be served (try creating their CRDs): Selector(any_name='servicemonitor')", "timestamp": "2024-01-10T18:12:42.210684+00:00", "severity": "warn"} 
{"message": "Unresolved resources cannot be served (try creating their CRDs): Selector(any_name='servicemonitor')", "timestamp": "2024-01-10T18:12:42.213385+00:00", "severity": "warn"} 
{"message": "Unresolved resources cannot be served (try creating their CRDs): Selector(any_name='servicemonitor')", "timestamp": "2024-01-10T18:12:42.214971+00:00", "severity": "warn"} 
{"message": "Unresolved resources cannot be served (try creating their CRDs): Selector(any_name='servicemonitor')", "timestamp": "2024-01-10T18:12:42.307980+00:00", "severity": "warn"} 
{"message": "Unresolved resources cannot be served (try creating their CRDs): Selector(any_name='servicemonitor')", "timestamp": "2024-01-10T18:12:42.319059+00:00", "severity": "warn"} 
{"message": "Unresolved resources cannot be served (try creating their CRDs): Selector(any_name='servicemonitor')", "timestamp": "2024-01-10T18:12:59.655493+00:00", "severity": "warn"} 
{"message": "Unresolved resources cannot be served (try creating their CRDs): Selector(any_name='servicemonitor')", "timestamp": "2024-01-10T18:12:59.661128+00:00", "severity": "warn"}
{"message": "Unresolved resources cannot be served (try creating their CRDs): Selector(any_name='servicemonitor')", "timestamp": "2024-01-10T18:12:59.709665+00:00", "severity": "warn"}
{"message": "Unresolved resources cannot be served (try creating their CRDs): Selector(any_name='servicemonitor')", "timestamp": "2024-01-10T18:12:59.805993+00:00", "severity": "warn"} 
{"message": "Unresolved resources cannot be served (try creating their CRDs): Selector(any_name='servicemonitor')", "timestamp": "2024-01-10T18:12:59.916102+00:00", "severity": "warn"} 
{"message": "Unresolved resources cannot be served (try creating their CRDs): Selector(any_name='servicemonitor')", "timestamp": "2024-01-10T18:13:00.005047+00:00", "severity": "warn"} 
{"message": "Unresolved resources cannot be served (try creating their CRDs): Selector(any_name='servicemonitor')", "timestamp": "2024-01-10T18:13:00.214595+00:00", "severity": "warn"} 
{"message": "Starting the watch-stream for servicemonitors.v1.monitoring.coreos.com cluster-wide.", "timestamp": "2024-01-10T18:13:00.311233+00:00", "severity": "debug"} 
{"message": "Unresolved resources cannot be served (try creating their CRDs): Selector(any_name='servicemonitor')", "timestamp": "2024-01-10T18:13:00.315617+00:00", "severity": "warn"} 
{"message": "Stopping the watch-stream for servicemonitors.v1.monitoring.coreos.com cluster-wide.", "timestamp": "2024-01-10T18:13:00.404687+00:00", "severity": "debug"}

Additional information

We have found this framework super useful so far and first saw this issue in the Kopf version 1.36.2 so I believe it could be related to this change - a499244

I have no kubernetes API pods in my cluster to get logs off to help me debug the issue and the code snippet I provided has stripped out some of servicemonitor_create_fn logic because we are not getting as far as entering the function. I am simultaneously looking for a workaround this this bug by trying to find out if there is any setting to update to allow a retry or stop querying the kubernetes API quite as regularly so if you can provide any details on that it would be great.

Final thing to note is in a healthy cluster example the final unresolved log
{"message": "Unresolved resources cannot be served (try creating their CRDs): Selector(any_name='servicemonitor')", "timestamp": "2024-01-10T18:13:00.315617+00:00", "severity": "warn"}
does not appear after the watch-stream has started so maybe it could be a timing window issue.