mher / flower

Real-time monitor and web admin for Celery distributed task queue

Home Page:https://flower.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Emitting metrics of the disabled worker

ad-m-ss opened this issue · comments

Describe the bug

We use Celery on Kubernetes. After deployment, we have reported deleted workers as online via metrics. We use FLOWER_PURGE_OFFLINE_WORKERS to cleanup workers from UI, but it persists in metrics.

I can see where metrics should be updated (

self.metrics.worker_online.labels(worker_name).set(0)
) however, this action does not seem to be sufficient. Dashboard also use application.update_workers to refresh data (
if refresh:
try:
self.application.update_workers()
except Exception as e:
logger.exception('Failed to update workers: %s', e)
, then use heartbeat to determine when worker purge (
if not last_heartbeat or timestamp - last_heartbeat > options.purge_offline_workers:
).

To Reproduce
Steps to reproduce the behavior:

A detailed scenario has not been developed yet. Experiments will be developed after the initial discussion.

Expected behavior
A clear and concise description of what you expected to happen.

In perfect condition:

  • for a specified period of time emitting "0" for a deleted worker
  • removal of the metric after a long time of worker's inactivity.

We might also consider exposing heartbeat value for all workers.

Screenshots

We have list of worker in UI:

image

We have list of metrics:

image

Please note that two additional workers are reported as online.

System information
Output of python -c 'from flower.utils import bugreport; print(bugreport())' command

$ python -c 'from flower.utils import bugreport; print(bugreport())'
flower   -> flower:1.0.0 tornado:6.1 humanize:3.10.0
software -> celery:5.1.2 (sun-harmonics) kombu:5.1.0 py:3.9.6
            billiard:3.6.4.0 redis:3.5.3
platform -> system:Linux arch:64bit
            kernel version:5.4.196-108.356.amzn2.x86_64 imp:CPython
loader   -> celery.loaders.app.AppLoader
settings -> transport:redis results:redis://**REDACTED**.cache.amazonaws.com:6379/0

deprecated_settings: None

probably the same like #1128 ?

Yes, my bad. I miss that issue when searching for duplicate myself.