Worker crash with: Didn't find right amount of data in todo!: null does no create error. file
leadbi opened this issue · comments
Describe the bug
While reviewing the logs we noticed a lot of restarts of workers with this crash:
2023-06-19T05:37:45.420Z [CRIT] [-] [outbound] Didn't find right amount of data in todo!: null
2023-06-19T05:37:45.423Z [CRIT] [-] [core] Error [ERR_UNHANDLED_ERROR]: Unhandled error. ("Didn't find right amount of data in todo!: null")
2023-06-19T05:37:45.423Z [CRIT] [-] [core] at new NodeError (node:internal/errors:387:5)
2023-06-19T05:37:45.423Z [CRIT] [-] [core] at HMailItem.emit (node:events:502:17)
2023-06-19T05:37:45.423Z [CRIT] [-] [core] at /opt/swiftmta/node_modules/Haraka/outbound/hmail.js:114:26
2023-06-19T05:37:45.423Z [CRIT] [-] [core] at ReadStream.<anonymous> (/opt/swiftmta/node_modules/Haraka/outbound/hmail.js:153:13)
2023-06-19T05:37:45.423Z [CRIT] [-] [core] at ReadStream.emit (node:events:513:28)
2023-06-19T05:37:45.423Z [CRIT] [-] [core] at endReadableNT (node:internal/streams/readable:1358:12)
2023-06-19T05:37:45.423Z [CRIT] [-] [core] at processTicksAndRejections (node:internal/process/task_queues:83:21)
and the error. file was not created causing workers to crash over and over again.
Expected behavior
When this error happens a .error file should be created in the queue.
Observed behavior
The error. file was not created causing workers to crash over and over again.
Steps To Reproduce
This can be reproduced by corrupting a file in the queue.
We are not sure what caused the queue file to be corrupted in the first place leading to this issue.
System Info:
Haraka | 2.8.28
Node | v16.19.0
OS | Linux mta1.mailgyn.com 5.15.0-75-generic #82-Ubuntu SMP Tue Jun 6 23:10:23 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
openssl | OpenSSL 3.0.2 15 Mar 2022 (Library: OpenSSL 3.0.2 15 Mar 2022)
Additional context
We use Haraka to send outbound email.
I think this issue is caused by this line:
Line 113 in 5c30916
I think emit should be called after the file is renamed.