puma / puma

A Ruby/Rack web server built for parallelism

Home Page:https://puma.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Slow workers respawn due to race condition in 4.x

sitano opened this issue · comments

Describe the bug

When something issues a worker restart in 4.x it takes up to Const::WORKER_CHECK_INTERVAL (=5 seconds by default) for the manager process to respawn a worker. That is too much time for a worker to be absent under load.

To Reproduce

Kill a worker. Observe its absence up to Const::WORKER_CHECK_INTERVAL with ps aux.

Expected behavior

Respawn a worker as soon as the parent observed SIGCHLD.

** Example **

I, [2022-12-22T17:22:52.304877 #89305]  INFO : Worker idle timeout of 15 reached. Exiting... pid=89305
I, [2022-12-22T17:22:52.305826 #88491]  INFO : spawned <-- wasted check_workers spawn
... nothing happens here....
I, [2022-12-22T17:22:57.318201 #88491]  INFO : before spawn  <-- next round
I, [2022-12-22T17:22:57.325574 #88491]  INFO : forked
I, [2022-12-22T17:22:57.325631 #88491]  INFO : hooks
I, [2022-12-22T17:22:57.325647 #88491]  INFO : done
I, [2022-12-22T17:22:57.325678 #88491]  INFO : spawned
I, [2022-12-22T17:22:57.328877 #89334]  INFO : Server queue_requests=true, idle_timeout=10 pid=89334
[88491] - Worker 0 (pid: 89334) booted, phase: 0

Reason

Race condition in between receiving "t" command, the worker process actually exiting (SIGCHLD), and the parent checking over dead workers. Specifically that:

                when "t"
                  w.term unless w.term?
                  force_check = true

races with check_workers force_check and with receiving SIGCHLD such that the check_workers can't detect the worker exit as it's not finished yet and just shots into the blue.

How to fix

Backport this patch 67f9b1f.

If I may suggest my help here, I am all yours to do that. Just let me know what you think.

I am not sure this version (4.x) is supported just posting it here to let you know.

@dentarg ok. thank you for the info. then I am closing this one.