data61 / anonlink-entity-service

Privacy Preserving Record Linkage Service

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

celery queue takes too long to be cleaned after deleting a run

gusmith opened this issue · comments

Deployment:

 'version': {'anonlink': '0.11.2',
             'entityservice': 'v1.11.2', (dev branch)
             'python': '3.7.3'}

Deployed using docker-compose on a "smallish" machine (my laptop).
Not setting the option max-tasks-per-child to a small value.

Observation:

While running an experiment for multi party linkage (1M * 1M * 1M), the benchmark timed out as expected, which triggered the run deletion.
The runs is well deleted on the database, but it took around 20 minutes for the workers to empty the celery queue full of tasks meant to fail fast (raising a DBResourceMissing as expected).

20 minutes seem WAY too long to simply remove a task without computing anything. It still requires to query the database, but does not create any computing.

I think we first need to measure where the time is being spent.

What options come to mind to speed this up?

  • The workers could check redis instead of postgres at the start of a task to make sure it is still "active".
  • we could have an aggregation of comparison tasks, e.g., after the first 10k tasks the rest get combined into a single job that checks the job is still valid before scheduling the next 10k...

Closed in #432