OptimalBits / bull

Premium Queue package for handling distributed jobs and messages in NodeJS.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error: Unexpected exit code: 255 signal: null

vaidkaran opened this issue · comments

Description

I'm running parallel processes using bull concurrency option.
When I queue a lot of jobs, I randomly see this error in the bullUI for the failed job.

Error: Unexpected exit code: 255 signal: null
    at ChildProcess.exitHandler (/app/node_modules/bull/lib/process/sandbox.js:46:13)
    at ChildProcess.emit (node:events:538:35)
    at Process.ChildProcess._handle.onexit (node:internal/child_process:291:12)
    at Process.callbackTrampoline (node:internal/async_hooks:130:17)

This error is not consistent, keeps coming up randomly on different jobs. But only comes up when there are a lot of jobs getting processed in the queue.
I can tell from the logs that when it crashes, the job fails before it starts getting processed.

I did some investigation and I think it has to do with my jobs outputting a lot of logs.
So if there's logging into stdout which is more than the maxBuffer size, nodejs detaches from the child process with an exit code null.
A potential fix could be to increase the maxBuffer size.
Read the second answer here in this post

Is it possible to pass the maxBuffer option via bull somehow?

Bull version

4.10.2

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

I'm having the same issue. @vaidkaran have you found any solution to the problem?

commented

Same issue here

If you can, upgrade to BullMQ as it has much better child process implementation: https://github.com/taskforcesh/bullmq

Same issue here even in BullMQ. I am using useWorkerThreads: true

@sanidhya-saraswat can you provide more info? As the issue is closed it means no further action will be taken, so please if you want us to look into it you need to provide something we can go on.

commented

A report about what I did: I decided to migrate to BullMQ like @manast recommended. I don't have that error anymore (except when I do other mistakes but that's on me). One of the Bull features that made me refactor half of my project and might be the reason of this unexpected error is the possibility to wait for a job to be done (which is a anti/bad pattern). This feature can cause some troubles when scaling up.

I've been able to get around this error by reducing the amount of logs a job outputs