celery queue takes too long to be cleaned after deleting a run
gusmith opened this issue · comments
Deployment:
'version': {'anonlink': '0.11.2',
'entityservice': 'v1.11.2', (dev branch)
'python': '3.7.3'}
Deployed using docker-compose
on a "smallish" machine (my laptop).
Not setting the option max-tasks-per-child
to a small value.
Observation:
While running an experiment for multi party linkage (1M * 1M * 1M), the benchmark timed out as expected, which triggered the run deletion.
The runs is well deleted on the database, but it took around 20 minutes for the workers to empty the celery queue full of tasks meant to fail fast (raising a DBResourceMissing
as expected).
20 minutes seem WAY too long to simply remove a task without computing anything. It still requires to query the database, but does not create any computing.
I think we first need to measure where the time is being spent.
What options come to mind to speed this up?
- The workers could check redis instead of postgres at the start of a task to make sure it is still "active".
- we could have an aggregation of comparison tasks, e.g., after the first 10k tasks the rest get combined into a single job that checks the job is still valid before scheduling the next 10k...
Closed in #432