Scheduler leaks jobs

Question

Scheduler leaks jobs

Chronial opened this issue 7 years ago · comments

In multiple situations, rq-scheduler will leave its jobs in redis and never remove them again, effectively leaking redis storage.

Ways to make this happen:

Jobs without result_ttl:
1. Call schedule() with interval or cron() with any parameters and pass a repeat to either of these
All jobs:
1. Schedule a job in any way and cancel() it before it was enqeued

Chris Withers · Answer 1 · Thu Jun 14 2018 01:08:22 GMT+0800 (China Standard Time)

So, what happens if you use schedule() with interval but no result_ttl and no repeat?
The reason I ask is because, anecdotally, setting any result_ttl on a schedule(interval=x) job results in the scheduler ceasing to schedule that job after a period of time.

Stefanescu Marian · Answer 2 · Sun Jan 19 2020 23:40:53 GMT+0800 (China Standard Time)

Pretty old this thread, but good it's still not closed.
Something very similar happens to me also. The scheduler seems(as I'm not sure of this yet) to leak jobs. I'm not using rq-scheduler directly but rather via django-rq-scheduler which uses django-rq which in term uses rq-scheduler. I've investigated the code...from all the packages and I'm pretty sure the only way for a job NOT to be scheduled again is for the scheduler to stop when executing enqueue_job. Let me explain this in detail:

From I've seen, by calling schedule(**kwargs) a NEW job is created, the interval and repeat params are added to the job's meta field and self.connection.zadd(self.scheduled_jobs_key, {job.id: to_unix(scheduled_time)}) adds to the self.scheduled_jobs_key sorted set a new element with score scheduled_time.
If repeat=None this schedule(...) method, if run once, would make a job run indefinitely. That's because of the scheduler's run method.
The run method will call enqueue_jobs which will get all the jobs, and then call enqueue_job for each job.
enqueue_job does something like this:

queue = self.get_queue_for_job(job)
queue.enqueue_job(job)
self.connection.zrem(self.scheduled_jobs_key, job.id)

if interval:
    # If this is a repeat job and counter has reached 0, don't repeat
    if repeat is not None:
        if job.meta['repeat'] == 0:
            return
        self.connection.zadd(self.scheduled_jobs_key, {job.id: to_unix(datetime.utcnow()) + int(interval)})

I believe the problem can manifest only between the zrem and zadd. What I'm thinking...if for any reason a SIGKILL for ex. reaches between the two, then the process is killed, zadd won't execute and this 'periodic' job would not be 'periodic' anymore.

Is this piece of code run as an atomic transaction? From what I've seen I don't think so and someone with more exp with the repo might help me out (🙏 @selwin 🙏).