OptimalBits / bull

Premium Queue package for handling distributed jobs and messages in NodeJS.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

All repeatable jobs become paused on first run?

Linksku opened this issue · comments

Description

I've been using Bull to run repeatable jobs for years. However, recently I've noticed jobs occasionally stop running. Even after restarting the server and re-adding all the Bull jobs, the job handlers don't run. Rebooting the machine fixes it, I'd assume flushing Redis fixes it too, but I haven't been able to try it.

When the server starts, I run queue.add. Using Bull Dashboard, I could see that the job was scheduled successfully. However, when it's supposed to run, it moves to the "paused" state without calling the process callback. Then, Bull never attempts to run the job again until the server restarts. On my dev machine, the server's restarting every time I save a file, so I end up with thousands of paused jobs.

Without digging into the Bull code, my hypothesis is that this starts happening after I restart the server. Bull assumes the old server instance is still available, so it attempts to run the job on the old server instance. If this is the case, how can I get Bull to run it on the new server instance instead? Am I supposed to run some cleanup function to tell Bull that the server instance will be no longer available?

I saw on another thread (#1739) that they fixed a similar issue by removing repeatable jobs before adding them. However, with multiple web servers sharing a Redis server, it'll remove and add jobs every time a server restarts.

Minimal, Working Test code to reproduce the issue.

(An easy to reproduce test case will dramatically decrease the resolution time.)

  queue.process(
    name,
    () => {
      console.log('ran');
    },
  );

  await queue.add(name, null, {
    repeat: { every: 60 * 1000 },
    timeout: 60 * 1000 ,
    removeOnComplete: true,
    removeOnFail: true,
  });

Bull version

4.11.5

I do not think this is a Bull issue, the queue going to pause status can only happen if you manually paused the queue, either via some frontend or using the API.

You're right, pause() is called occasionally, I thought recreating the queue meant I didn't need to unpause it.