OptimalBits / bull

Premium Queue package for handling distributed jobs and messages in NodeJS.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

After a prolonged period, new jobs don't get executed

NITHISH-1609 opened this issue · comments

Description:

I'm currently facing a critical issue with my Node.js server running in a production environment. The server hosts a Node.js application that utilizes Bull for job scheduling and processing. Initially, everything was functioning correctly, with jobs being scheduled and executed as expected. However, after a prolonged period of normal operation, the system has encountered a problem where new jobs are no longer being executed.

console-queue.ts


export const consoleQueueProcess: ProcessCallbackFunction<any> = async (
  job: Job,
  done: DoneCallback,
): Promise<any> => {
  console.log(job.data);
  done();
};

queue.ts:

export const Queues = {
  consoleQueue: new Bull("console", {
    redis: redisConnectionOption,
    defaultJobOptions: {
      removeOnComplete: true,
      attempts: 10,
      backoff: {
        type: "exponential",
        delay: 15000, //start by delaying for 15 sec
      },
    },
  }),
};

export const initializeQueueProcess = () => {
  Queues.consoleQueue.process(consoleQueueProcess);
};

index.ts

After DB successfully initialized, we call the initializeQueueProcess();

Bull version

"bull": "^4.10.4",

Additional information

I restart my node server and it started working. But I remember, When I restarted it for the first time, it didn't work. But when I did that for the second time(after ~2hrs) gap. It worked.

This is not enough information for us to take any action. If this happens again you should check if the workers have connectivity with Redis, you can use the getWorkers API in bull or Redis command CLIENT LIST.

Okay, @manast.
Couple of questions.

  • Let's say my redis connection with the queue got lost and if it reconnects. Do I need to bind the worker with the queue again? Referring to my code - should I need to call the initializeQueueProcess() again? If so, do we have any callback function?

  • I experienced this on my local. I added a console task with 10sec delay. I terminated the redis connection. I waited for ~20sec. I turned on my redis. I waited, thinking it will console the data. But, when I added another console task, it processed and consoled. Is this expected behaviour? Is that because of the retry strategy, it didn't execute when I waited, but on new task added, it immediately executed?

  • More info on the described issue(prolonged issue). I was adding many tasks to various queue but none of them worked. Once I restarted they were working!? I felt they lost their workers and I thought I need to re initialize that. I mean, call initializeQueueProcess() again!

  • Let's say my redis connection with the queue got lost and if it reconnects. Do I need to bind the worker with the queue again? Referring to my code - should I need to call the initializeQueueProcess() again? If so, do we have any callback function?

No, the Redis connection will re-established automatically by default unless you override the default arguments.

  • I experienced this on my local. I added a console task with 10sec delay. I terminated the redis connection. I waited for ~20sec. I turned on my redis. I waited, thinking it will console the data. But, when I added another console task, it processed and consoled. Is this expected behaviour? Is that because of the retry strategy, it didn't execute when I waited, but on new task added, it immediately executed?

No, the original delayed job should be processed directly as soon as the connection to Redis is re-established since 10 seconds already passed.

  • More info on the described issue(prolonged issue). I was adding many tasks to various queue but none of them worked. Once I restarted they were working!? I felt they lost their workers and I thought I need to re initialize that. I mean, call initializeQueueProcess() again!

If you are adding jobs while Redis is disconnected you may want to throw an exception following this scenario: https://docs.bullmq.io/patterns/failing-fast-when-redis-is-down

The workers on the other hand should be able to reconnect and handle connection errors automatically: https://docs.bullmq.io/guide/going-to-production#automatic-reconnections

If you can, upgrade to BullMQ, as Bull is reaching EOL.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.