celery queue takes too long to be cleaned after deleting a run

Question

celery queue takes too long to be cleaned after deleting a run

gusmith opened this issue 5 years ago · comments

Deployment:

 'version': {'anonlink': '0.11.2',
             'entityservice': 'v1.11.2', (dev branch)
             'python': '3.7.3'}

Deployed using docker-compose on a "smallish" machine (my laptop).
Not setting the option max-tasks-per-child to a small value.

Observation:

While running an experiment for multi party linkage (1M * 1M * 1M), the benchmark timed out as expected, which triggered the run deletion.
The runs is well deleted on the database, but it took around 20 minutes for the workers to empty the celery queue full of tasks meant to fail fast (raising a DBResourceMissing as expected).

20 minutes seem WAY too long to simply remove a task without computing anything. It still requires to query the database, but does not create any computing.

Brian Thorne · Answer 1 · Fri Sep 06 2019 15:42:57 GMT+0800 (China Standard Time)

I think we first need to measure where the time is being spent.

What options come to mind to speed this up?

The workers could check redis instead of postgres at the start of a task to make sure it is still "active".
we could have an aggregation of comparison tasks, e.g., after the first 10k tasks the rest get combined into a single job that checks the job is still valid before scheduling the next 10k...

Brian Thorne · Answer 2 · Wed Oct 02 2019 13:10:36 GMT+0800 (China Standard Time)

Closed in #432