Error: Unexpected exit code: 255 signal: null

Question

Error: Unexpected exit code: 255 signal: null

vaidkaran opened this issue 9 months ago · comments

Description

I'm running parallel processes using bull concurrency option.
When I queue a lot of jobs, I randomly see this error in the bullUI for the failed job.

Error: Unexpected exit code: 255 signal: null
    at ChildProcess.exitHandler (/app/node_modules/bull/lib/process/sandbox.js:46:13)
    at ChildProcess.emit (node:events:538:35)
    at Process.ChildProcess._handle.onexit (node:internal/child_process:291:12)
    at Process.callbackTrampoline (node:internal/async_hooks:130:17)

This error is not consistent, keeps coming up randomly on different jobs. But only comes up when there are a lot of jobs getting processed in the queue.
I can tell from the logs that when it crashes, the job fails before it starts getting processed.

I did some investigation and I think it has to do with my jobs outputting a lot of logs.
So if there's logging into stdout which is more than the maxBuffer size, nodejs detaches from the child process with an exit code null.
A potential fix could be to increase the maxBuffer size.
Read the second answer here in this post

Is it possible to pass the maxBuffer option via bull somehow?

Bull version

4.10.2

stale · Answer 1 · Mon Oct 16 2023 06:04:38 GMT+0800 (China Standard Time)

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Manuel Ghizzoni · Answer 2 · Mon Nov 06 2023 01:27:47 GMT+0800 (China Standard Time)

I'm having the same issue. @vaidkaran have you found any solution to the problem?

Nick · Answer 3 · Thu Dec 21 2023 09:39:28 GMT+0800 (China Standard Time)

Same issue here

Manuel Astudillo · Answer 4 · Fri Dec 22 2023 17:42:04 GMT+0800 (China Standard Time)

If you can, upgrade to BullMQ as it has much better child process implementation: https://github.com/taskforcesh/bullmq

Sanidhya Saraswat · Answer 5 · Sun Mar 31 2024 13:24:49 GMT+0800 (China Standard Time)

Same issue here even in BullMQ. I am using useWorkerThreads: true

Manuel Astudillo · Answer 6 · Sun Mar 31 2024 18:51:51 GMT+0800 (China Standard Time)

@sanidhya-saraswat can you provide more info? As the issue is closed it means no further action will be taken, so please if you want us to look into it you need to provide something we can go on.

Nick · Answer 7 · Tue Apr 02 2024 03:46:52 GMT+0800 (China Standard Time)

A report about what I did: I decided to migrate to BullMQ like @manast recommended. I don't have that error anymore (except when I do other mistakes but that's on me). One of the Bull features that made me refactor half of my project and might be the reason of this unexpected error is the possibility to wait for a job to be done (which is a anti/bad pattern). This feature can cause some troubles when scaling up.

vaidkaran · Answer 8 · Tue Apr 02 2024 13:25:32 GMT+0800 (China Standard Time)

I've been able to get around this error by reducing the amount of logs a job outputs