Support raise_signal_exceptions

Question

Support raise_signal_exceptions

DanielStevenLewis opened this issue 6 months ago · comments

DanielStevenLewis commented 6 months ago

https://github.com/Betterment/delayed#migrating-from-delayedjob states "that some configurations, like queue_attributes, exit_on_complete, backend, and raise_signal_exceptions have been removed entirely." I think the lack of raise_signal_exceptions (and the reliance on the behaviour described in https://github.com/Betterment/delayed#running-a-worker-process) could prevent me from suggesting switching over from delayed_job to delayed. Would it be difficult to support raise_signal_exceptions and are there any concerns with the idea of supporting it?

John Mileham · Answer 1 · Tue Jan 23 2024 04:25:03 GMT+0800 (China Standard Time)

Can you say more about what your concerns are with the delayed behavior? Delayed's behavior prioritizes finishing jobs that have begun to the extent possible before worker shutdown in an attempt not to waste work and minimize job latency. It also leans into the assumption that not every job payload will have been implemented with ideal semantic idempotency. In our view having a more opinionated and curated worker drain/deployment process is an advantage, but would love to learn more about your context.

DanielStevenLewis · Answer 2 · Tue Jan 23 2024 05:42:38 GMT+0800 (China Standard Time)

We currently use Delayed::Worker.raise_signal_exceptions = :term with delayed_job. I'm hoping that we can switch over to delayed with minimal work/changes needed, and thereby benefit from the performance enhancements it has, as a quick win.
We restart the job servers whenever we deploy (every few days), and we have jobs that take many hours to run. I'm concerned that without this configuration option, after a deployment we'd have jobs that would take a very long time before they can retry.

Thanks for asking @jmileham . Is there more information I should try provide to better speak to your question?

John Mileham · Answer 3 · Tue Jan 23 2024 05:47:42 GMT+0800 (China Standard Time)

So you're looking to switch to delayed but would need to extend the job timeout, and aren't looking to implement a long-lived draining period in your infra coordination right away? Makes sense. I'll tag out now because @smudge will have smarter thoughts about where to go from here.

DanielStevenLewis · Answer 4 · Tue Jan 23 2024 05:49:52 GMT+0800 (China Standard Time)

Right! Thanks

Nathan Griffith · Answer 5 · Tue Jan 30 2024 01:04:34 GMT+0800 (China Standard Time)

I started looking into this on Friday, but I'll note that it's a little more complicated than simply adding the feature back. We removed it because it was incompatible with delayed's multithreading (where a single worker can claim & work off multiple jobs at once, configured via the max_claims option). Supporting raise_singal_exceptions in a way that would allow the individual job threads to rescue would require some extra signal-passing across threads that I haven't had a chance to explore yet.