nolar / kopf

A Python framework to write Kubernetes operators in just a few lines of code

Home Page:https://kopf.readthedocs.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Kopf crashes when there are disabled APIServers.

mehrdad-khojastefar opened this issue · comments

Long story short

I've come across this issue when I was trying to run my operator inside a kubernetes cluster that has linkerd.io as its service mesh. the thing is it is not setup correctly so the team decided to disable the api via --runtime-config. now the /apis/tap.linkerd.io/v1alpha1/ returns 503 errors.
Normally I would like to ignore this error as the kubectl does, when I list pods it shows a little warning that tap.linkerd.io is not available and then shows me the list of pods.
But I noticed that kopf keeps getting crashed.

I have tried settings.scanning.disabled = True but that did not help, although I thought it would while reading the docs.

Kopf version

1.36.2

Kubernetes version

1.23.13

Python version

3.11

Code

No response

Logs

/usr/local/lib/python3.11/site-packages/kopf/_core/reactor/running.py:179: FutureWarning: Absence of either namespaces or cluster-wide flag will become an error soon. For now, switching to the cluster-wide mode for backward compatibility.
  warnings.warn("Absence of either namespaces or cluster-wide flag will become an error soon."
[2023-10-29 18:31:15,038] kopf.activities.star [INFO    ] Activity 'startup_config' succeeded.
[2023-10-29 18:31:15,128] kopf._core.engines.a [INFO    ] Initial authentication has been initiated.
[2023-10-29 18:31:15,130] kopf.activities.auth [INFO    ] Activity 'login_fn' succeeded.
[2023-10-29 18:31:15,130] kopf._core.engines.a [INFO    ] Initial authentication has finished.
[2023-10-29 18:31:16,932] kopf._core.reactor.o [ERROR   ] Request attempt #1/9 failed; will retry: GET https://kubernetes.default.svc/apis/tap.linkerd.io/v1alpha1 -> APIServerError(None, None)
[2023-10-29 18:31:19,133] kopf._core.reactor.o [ERROR   ] Request attempt #2/9 failed; will retry: GET https://kubernetes.default.svc/apis/tap.linkerd.io/v1alpha1 -> APIServerError(None, None)
[2023-10-29 18:31:20,236] kopf._core.reactor.o [ERROR   ] Request attempt #3/9 failed; will retry: GET https://kubernetes.default.svc/apis/tap.linkerd.io/v1alpha1 -> APIServerError(None, None)
[2023-10-29 18:31:22,335] kopf._core.reactor.o [ERROR   ] Request attempt #4/9 failed; will retry: GET https://kubernetes.default.svc/apis/tap.linkerd.io/v1alpha1 -> APIServerError(None, None)
[2023-10-29 18:31:25,347] kopf._core.reactor.o [ERROR   ] Request attempt #5/9 failed; will retry: GET https://kubernetes.default.svc/apis/tap.linkerd.io/v1alpha1 -> APIServerError(None, None)
[2023-10-29 18:31:30,437] kopf._core.reactor.o [ERROR   ] Request attempt #6/9 failed; will retry: GET https://kubernetes.default.svc/apis/tap.linkerd.io/v1alpha1 -> APIServerError(None, None)
[2023-10-29 18:31:38,454] kopf._core.reactor.o [ERROR   ] Request attempt #7/9 failed; will retry: GET https://kubernetes.default.svc/apis/tap.linkerd.io/v1alpha1 -> APIServerError(None, None)
[2023-10-29 18:31:51,477] kopf._core.reactor.o [ERROR   ] Request attempt #8/9 failed; will retry: GET https://kubernetes.default.svc/apis/tap.linkerd.io/v1alpha1 -> APIServerError(None, None)
[2023-10-29 18:32:12,502] kopf._core.reactor.o [ERROR   ] Request attempt #9/9 failed; escalating: GET https://kubernetes.default.svc/apis/tap.linkerd.io/v1alpha1 -> APIServerError(None, None)
[2023-10-29 18:32:12,538] kopf._core.reactor.r [ERROR   ] Resource observer has failed: (None, None)
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/kopf/_cogs/clients/errors.py", line 148, in check_response
    response.raise_for_status()
  File "/usr/local/lib/python3.11/site-packages/aiohttp/client_reqrep.py", line 1005, in raise_for_status
    raise ClientResponseError(
aiohttp.client_exceptions.ClientResponseError: 503, message='Service Unavailable', url=URL('https://kubernetes.default.svc/apis/tap.linkerd.io/v1alpha1')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/kopf/_cogs/aiokits/aiotasks.py", line 108, in guard
    await coro
  File "/usr/local/lib/python3.11/site-packages/kopf/_core/reactor/observation.py", line 113, in resource_observer
    resources = await scanning.scan_resources(groups=group_filter, settings=settings, logger=logger)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kopf/_cogs/clients/scanning.py", line 31, in scan_resources
    resources.update(await coro)
                     ^^^^^^^^^^
  File "/usr/local/lib/python3.11/asyncio/tasks.py", line 605, in _wait_for_one
    return f.result()  # May raise f.exception().
           ^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kopf/_cogs/clients/scanning.py", line 83, in _read_new_apis
    resources.update(await coro)
                     ^^^^^^^^^^
  File "/usr/local/lib/python3.11/asyncio/tasks.py", line 605, in _wait_for_one
    return f.result()  # May raise f.exception().
           ^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kopf/_cogs/clients/scanning.py", line 97, in _read_version
    rsp = await api.get(url, settings=settings, logger=logger)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kopf/_cogs/clients/api.py", line 111, in get
    response = await request(
               ^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kopf/_cogs/clients/auth.py", line 45, in wrapper
    return await fn(*args, **kwargs, context=context)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kopf/_cogs/clients/api.py", line 85, in request
    await errors.check_response(response)  # but do not parse it!
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kopf/_cogs/clients/errors.py", line 150, in check_response
    raise cls(payload, status=response.status) from e
kopf._cogs.clients.errors.APIServerError: (None, None)
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/kopf/_cogs/clients/errors.py", line 148, in check_response
    response.raise_for_status()
  File "/usr/local/lib/python3.11/site-packages/aiohttp/client_reqrep.py", line 1005, in raise_for_status
    raise ClientResponseError(
aiohttp.client_exceptions.ClientResponseError: 503, message='Service Unavailable', url=URL('https://kubernetes.default.svc/apis/tap.linkerd.io/v1alpha1')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/bin/kopf", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kopf/cli.py", line 60, in wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/decorators.py", line 92, in new_func
    return ctx.invoke(f, obj, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kopf/cli.py", line 109, in run
    return running.run(
           ^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kopf/_core/reactor/running.py", line 81, in run
    asyncio.run(coro)
  File "/usr/local/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kopf/_core/reactor/running.py", line 138, in operator
    await run_tasks(operator_tasks, ignored=existing_tasks)
  File "/usr/local/lib/python3.11/site-packages/kopf/_core/reactor/running.py", line 419, in run_tasks
    await aiotasks.reraise(root_done | root_cancelled | hung_done | hung_cancelled)
  File "/usr/local/lib/python3.11/site-packages/kopf/_cogs/aiokits/aiotasks.py", line 238, in reraise
    task.result()  # can raise the regular (non-cancellation) exceptions.
    ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kopf/_cogs/aiokits/aiotasks.py", line 108, in guard
    await coro
  File "/usr/local/lib/python3.11/site-packages/kopf/_core/reactor/observation.py", line 113, in resource_observer
    resources = await scanning.scan_resources(groups=group_filter, settings=settings, logger=logger)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kopf/_cogs/clients/scanning.py", line 31, in scan_resources
    resources.update(await coro)
                     ^^^^^^^^^^
  File "/usr/local/lib/python3.11/asyncio/tasks.py", line 605, in _wait_for_one
    return f.result()  # May raise f.exception().
           ^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kopf/_cogs/clients/scanning.py", line 83, in _read_new_apis
    resources.update(await coro)
                     ^^^^^^^^^^
  File "/usr/local/lib/python3.11/asyncio/tasks.py", line 605, in _wait_for_one
    return f.result()  # May raise f.exception().
           ^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kopf/_cogs/clients/scanning.py", line 97, in _read_version
    rsp = await api.get(url, settings=settings, logger=logger)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kopf/_cogs/clients/api.py", line 111, in get
    response = await request(
               ^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kopf/_cogs/clients/auth.py", line 45, in wrapper
    return await fn(*args, **kwargs, context=context)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kopf/_cogs/clients/api.py", line 85, in request
    await errors.check_response(response)  # but do not parse it!
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kopf/_cogs/clients/errors.py", line 150, in check_response
    raise cls(payload, status=response.status) from e
kopf._cogs.clients.errors.APIServerError: (None, None)

Additional information

No response

@mehrdad-khojastefar I'm also facing the exact same issue in our k8s cluster

@mehrdad-khojastefar were you able to find any workaround on this issue?

@prabhatkgupta I was able to fix it, you can take a look at it here https://github.com/mehrdad-khojastefar/kopf
As you can tell I hadn't had time to make it a proper pull request :), I am using this version in production and it hadn't have problems since. Please review it and use it with caution. I don't suggest to use it everywhere without testing and ... .
I will make it a proper pull request in the upcomming weeks.

@mehrdad-khojastefar how can I use your code in my docker?

@prabhatkgupta
https://gist.github.com/javrasya/e95ade856ff42e4649972f8a54368459
This would help. you need to modify requirements.txt file and rebuild your docker image

@mehrdad-khojastefar tried to pip install from your github repo, facing the following issue

Traceback (most recent call last):
 File "/usr/local/bin/kopf", line 5, in <module>
   from kopf.cli import main
 File "/usr/local/lib/python3.9/site-packages/kopf/__init__.py", line 117, in <module>
   from kopf._core.engines.admission import (
 File "/usr/local/lib/python3.9/site-packages/kopf/_core/engines/admission.py", line 14, in <module>
   from kopf._cogs.clients import creating, errors, patching
 File "/usr/local/lib/python3.9/site-packages/kopf/_cogs/clients/creating.py", line 3, in <module>
   from kopf._cogs.clients import api
 File "/usr/local/lib/python3.9/site-packages/kopf/_cogs/clients/api.py", line 55, in <module>
   ) -> aiohttp.ClientResponse | None:
TypeError: unsupported operand type(s) for |: 'type' and 'NoneType'