nolar / kopf

A Python framework to write Kubernetes operators in just a few lines of code

Home Page:https://kopf.readthedocs.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Timers not running when a filter is used.

cannibalisticcow opened this issue · comments

commented

Long story short

In the documentation here https://kopf.readthedocs.io/en/stable/timers/ it states that filters can be applied to the timer handler, however when doing this the timer runs once on startup but then doesn't run again.

I'm looking to have a timer run every 3-6 hours, however I only want a single CR to be reconciled once every 24 hours.

To do this in the cleanest way possible I was looking to add a when check on the timer rather than within the main code block.

When testing I dropped the interval down to run every 10 seconds and the check to be every 60.

Kopf version

1.35.3

Kubernetes version

1.24.0

Python version

3.10.6

Code

import kopf

@kopf.on.startup()
def configure(settings: kopf.OperatorSettings, **_):
    # Turn off logging to kube events
    settings.posting.enabled = False
    settings.execution.max_workers = 1
    settings.persistence.finalizer = config.FINALIZER


def check_last_successful_run(**_):
    success_time = safe_get(_, "meta", "annotations", config.LAST_RECONCILE_TIMESTAMP)

    if not success_time:
        # If this annotation isn't there we can make the assumption that this is the first run for this object.
        return True

    success_time = datetime.strptime(success_time, "%Y-%m-%dT%H:%M:%SZ")

    dt_now = datetime.utcnow()

    difference = dt_now - success_time

    if difference.seconds > 60:
        return True

    # Still within the time frame that the scan shouldn't run again so skipping run.
    return False

@kopf.timer(
    config.GROUP,
    config.VERSION,
    config.KIND,
    interval=10,  # 21600 - 6 hours
    when=check_last_successful_run,
)
def main_function(
    name,
    namespace,
    annotations,
    status,
    logger,
    spec,
    patch,
    **kwargs,
):
logger.info("In main code block")

Logs

[2023-04-21 14:31:27,581] kopf.activities.star [INFO    ] Activity 'configure' succeeded.
[2023-04-21 14:31:27,602] kopf._core.engines.a [INFO    ] Initial authentication has been initiated.
[2023-04-21 14:31:27,957] kopf.activities.auth [INFO    ] Activity 'login_via_client' succeeded.
[2023-04-21 14:31:27,958] kopf._core.engines.a [INFO    ] Initial authentication has finished.
[2023-04-21 14:31:28,465] kopf.objects         [WARNING ] [testing/test] Operator running in targeted mode (targeting testing/test only.
[2023-04-21 14:31:29,378] kopf.objects         [INFO    ] [testing/test] Initial validating checks complete, Setting to Provisioning and moving on to create_or_update
[2023-04-21 14:31:29,973] kopf.objects         [INFO    ] [testing/test] manager - create or update for image scanner complete
[2023-04-21 14:31:30,426] kopf.objects         [INFO    ] [testing/test] Timer 'main_function' succeeded.

I would expect to see the timer run again 10 seconds later, however nothing happens.

Additional information

No response

Confirming the weird behaviour of on.timer + when condition. Here is my example to reproduce:

def in_default_ns(namespace, **_): return namespace == "default"

def random_condition(**_):
    x = random.randint(0, 1)
    print(f"Random: {x}")
    return x

@kopf.on.timer("pods", when=kopf.all_([in_default_ns, random_condition]), registry=registry, interval=10)
async def task_by_timeout(**_):
    print("Timer triggered")

I have a single pod running in the default namespace, and the output looks like this:

Random: 1
Random: 0
Random: 0
Random: 0
Random: 1
Random: 1
Timer triggered
Random: 1
Random: 0
Random: 1
Random: 1
Timer triggered
Random: 1
Random: 1
Random: 1
Random: 1
Random: 1
Random: 0
Random: 1
Random: 1
Timer triggered
Random: 0
Random: 0
Random: 0
Random: 1
Timer triggered
Timer triggered
Timer triggered
Timer triggered
Timer triggered
Timer triggered
Timer triggered
Timer triggered
Timer triggered
Timer triggered
Timer triggered
Timer triggered
Timer triggered
Timer triggered
Timer triggered
Timer triggered
Timer triggered
Timer triggered
Timer triggered
Timer triggered
Timer triggered
Timer triggered

There is no any correlation between the timer and the condition call.

@NikPaushkin I'm afraid filtering callbacks for timers and daemons can not be of such dynamic nature like in your random_condition example. kopf needs to figure out if it should put a finalizer on a resource for which it runs a timer (or daemon). it does this by checking the filtering arguments and executing the callbacks when starting the timers/deamons.
If that randomly changes things will not work properly.

@cannibalisticcow there are a few bugs in kopf related to using filtering callbacks with timers and daemons.
You're best of to put your filtering code inside the timer function.

The bugs I've found are that once a timer or daemon is running, kopf no longer considers the filters.
If it would want to fully support filtering it would have to consider the filters before each invocation of the handler which it currently does not.
I have a patch that makes this work.
But it's somewhat unclear how to deal with such dynamic filter callbacks related to adding finalizers or not.

Maybe it would be best to not allow callback filters on timers and deamons to make things work more predictable.
You can always do the filtering inside your handler function and then either do something or not.

@nolar thoughts?

Thanks for the feedback @asteven. I figured that out, but I have to say that my idea with randint call was to show that the filters may depend on the external environment. If they can't depend on dynamic environment, I would suggest:

  1. Add a strict requirement to the documentation for timers/daemons, that the filters must be pure functions.
  2. Consider calling the filter each time before the timer call. I don't see a problem with the finalizer here: the finalizer can simply be added for any resource if it's bound to a timer, or it may not be added for the timers at all.

P.S. Why I'm against filters inside the handler:

  1. I don't want extra logging and don't want to disable it manually for the specific timers.
  2. Why do we need timers in this case? I can simply spawn asyncio tasks from on.create and on.resume, if I have to write filters inside anyway.

I think this is an issue with documentation, which is not clear enough on one particular nuance:

When=/field=/other decorator-level filters affect whether Kopf sees the resource or not, i.e. they are “visibility filters”. If it returns false, Kopf frees all the system resources and memory used for that object, and stops the timers & daemons.

As such, you cannot make them depend on randomizers or time or any fluctuating conditions. They must depend only on the resource’s content. They are re-evaluated only when a new event arrives from k8s for the resource. If nothing happens in k8s, Kopf will never re-evaluate the condition simply because it does not remember that such resource exists.

For dynamic conditions, you should put them either inside your timer functions (which produces logs, as you said), or inside the daemons (an easier alternative to manual task orchestration). With daemons, there will be no extra logs (unless the daemon fails and restarts).

Yes, I'm using daemons now, personally. The additions to the docs regarding the pureness of filters and their lifecycle would be useful, for sure.