puma / puma

A Ruby/Rack web server built for parallelism

Home Page:https://puma.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Puma cluster not reaping child processes when PID is 1 with Puma 6.4.1

stanhu opened this issue · comments

Describe the bug

We have a separate fleet of Puma workers to handle ActionCable, and since upgrading to 6.4.1 we have seen a significant increase of unhealthy pods and can't alloc thread error messages. In addition, the pod's readiness checks started to fail, causing Kubernetes to periodically shutdown and restart the Puma container.

We rolled back to 6.4.0 and saw a dramatic drop in errors. See https://gitlab.com/gitlab-com/gl-infra/production/-/issues/17372 for the full details.

Puma config:

# frozen_string_literal: true

# Load "path" as a rackup file.
#
# The default is "config.ru".
#
rackup '/srv/gitlab/config.ru'
pidfile "#{ENV['HOME']}/puma.pid"
state_path "#{ENV['HOME']}/puma.state"

stdout_redirect '/srv/gitlab/log/puma.stdout.log',
  '/srv/gitlab/log/puma.stderr.log',
  true

# Configure "min" to be the minimum number of threads to use to answer
# requests and "max" the maximum.
#
# The default is "0, 16".
#
threads (ENV['PUMA_THREADS_MIN'] ||= '1').to_i , (ENV['PUMA_THREADS_MAX'] ||= '16').to_i

# By default, workers accept all requests and queue them to pass to handlers.
# When false, workers accept the number of simultaneous requests configured.
#
# Queueing requests generally improves performance, but can cause deadlocks if
# the app is waiting on a request to itself. See https://github.com/puma/puma/issues/612
#
# When set to false this may require a reverse proxy to handle slow clients and
# queue requests before they reach puma. This is due to disabling HTTP keepalive
queue_requests false

# Bind the server to "url". "tcp://", "unix://" and "ssl://" are the only
# accepted protocols.

# We want to provide the ability to enable individually control HTTP (`INTERNAL_PORT`)
# HTTPS (`SSL_INTERNAL_PORT`):
#
# 1. HTTP on, HTTPS on: Since `INTERNAL_PORT` is configured, we listen on it.
# 2. HTTP on, HTTPS off: If we don't specify either port, we default to HTTP
#    because SSL requires a certificate and key to work.
# 3. HTTP off, HTTPS on: `SSL_INTERNAL_PORT` is enabled but
#   `INTERNAL_PORT` is not set.
http_port = ENV['INTERNAL_PORT'] || '8080'
http_addr =
  if ENV['INTERNAL_PORT'] || (!ENV['INTERNAL_PORT'] && !ENV['SSL_INTERNAL_PORT'])
    "0.0.0.0"
  else
    # If HTTP is disabled, we still need to listen to 127.0.0.1 for health checks.
    "127.0.0.1"
  end

bind "tcp://#{http_addr}:#{http_port}"

if ENV['SSL_INTERNAL_PORT']
  ssl_params = {
    cert: ENV['PUMA_SSL_CERT'],
    key: ENV['PUMA_SSL_KEY'],
  }

  ssl_params[:ca] = ENV['PUMA_SSL_CLIENT_CERT'] if ENV['PUMA_SSL_CLIENT_CERT']
  ssl_params[:key_password_command] = ENV['PUMA_SSL_KEY_PASSWORD_COMMAND'] if ENV['PUMA_SSL_KEY_PASSWORD_COMMAND']
  ssl_params[:ssl_cipher_filter] = ENV['PUMA_SSL_CIPHER_FILTER'] if ENV['PUMA_SSL_CIPHER_FILTER']
  ssl_params[:verify_mode] = ENV['PUMA_SSL_VERIFY_MODE'] || 'none'

  ssl_bind '0.0.0.0', ENV['SSL_INTERNAL_PORT'], ssl_params
end

workers (ENV['WORKER_PROCESSES'] ||= '3').to_i

require "/srv/gitlab/lib/gitlab/cluster/lifecycle_events"

on_restart do
  # Signal application hooks that we're about to restart
  Gitlab::Cluster::LifecycleEvents.do_before_master_restart
end

before_fork do
  # Signal application hooks that we're about to fork
  Gitlab::Cluster::LifecycleEvents.do_before_fork
end

Gitlab::Cluster::LifecycleEvents.set_puma_options @config.options
on_worker_boot do
  # Signal application hooks of worker start
  Gitlab::Cluster::LifecycleEvents.do_worker_start
end

on_worker_shutdown do
  # Signal application hooks that a worker is shutting down
  Gitlab::Cluster::LifecycleEvents.do_worker_stop
end

# Preload the application before starting the workers; this conflicts with
# phased restart feature. (off by default)
preload_app!

tag 'gitlab-puma-worker'

# Verifies that all workers have checked in to the master process within
# the given timeout. If not the worker process will be restarted. Default
# value is 60 seconds.
#
worker_timeout (ENV['WORKER_TIMEOUT'] ||= '60').to_i

# https://github.com/puma/puma/blob/master/5.0-Upgrade.md#lower-latency-better-throughput
wait_for_less_busy_worker (ENV['PUMA_WAIT_FOR_LESS_BUSY_WORKER'] ||= '0.001').to_f

# Use json formatter
require "/srv/gitlab/lib/gitlab/puma_logging/json_formatter"

json_formatter = Gitlab::PumaLogging::JSONFormatter.new
log_formatter do |str|
  json_formatter.call(str)
end

require "/srv/gitlab/lib/gitlab/puma/error_handler"

error_handler = Gitlab::Puma::ErrorHandler.new(ENV['RAILS_ENV'] == 'production')

lowlevel_error_handler do |ex, env, status_code|
  error_handler.execute(ex, env, status_code)
end

Please copy-paste your Puma config AND your command line options here.

/srv/gitlab/bin/bundle exec puma --environment production --config /srv/gitlab/config/puma.rb /srv/gitlab/config.ru

Example process list:

$ ps -ef
UID          PID    PPID  C STIME TTY          TIME CMD
git            1       0  0 20:07 ?        00:00:34 puma 6.4.1 (tcp://0.0.0.0:8080) [gitlab-puma-worker]
git           41       1  0 20:07 ?        00:00:21 /usr/local/bin/gitlab-logger /var/log/gitlab
git           60       1  0 20:07 ?        00:00:47 ruby /srv/gitlab/bin/metrics-server
git           63       1  1 20:07 ?        00:00:54 puma: cluster worker 0: 1 [gitlab-puma-worker]
git           65       1  1 20:07 ?        00:00:55 puma: cluster worker 1: 1 [gitlab-puma-worker]
git           67       1  1 20:07 ?        00:00:59 puma: cluster worker 2: 1 [gitlab-puma-worker]
git           69       1  1 20:07 ?        00:00:55 puma: cluster worker 3: 1 [gitlab-puma-worker]
git          546       0  0 21:37 pts/0    00:00:00 bash
git          556     546  0 21:37 pts/0    00:00:00 ps -ef

Note we are running Puma as PID 1. I don't believe --fork-worker is being used.

To Reproduce

I'm working on a reproduction step right now. I suspect #3255 might have caused this issue. I didn't see reaped unknown child process messages for this ActionCable fleet, though I did see it in other fleet of workers that didn't appear to have increased error rates.

One thing I observed is that previously wait_workers could call Process.kill(0, w.pid) to verify that each worker was still running:

puma/lib/puma/cluster.rb

Lines 515 to 523 in 52eff8d

if Process.wait(w.pid, Process::WNOHANG)
true
else
w.term if w.term?
nil
end
rescue Errno::ECHILD
begin
Process.kill(0, w.pid)

Now that check only happens if fork_worker is enabled?

puma/lib/puma/cluster.rb

Lines 565 to 573 in a287025

if reaped_children.delete(w.pid) || (@options[:fork_worker] && Process.wait(w.pid, Process::WNOHANG))
true
else
w.term if w.term?
nil
end
rescue Errno::ECHILD
begin
Process.kill(0, w.pid)

@casperisfine @nateberkopec What do you think about reverting #3255 or putting it behind some configuration parameter?

Expected behavior

No errors.

  • Linux x86 running Puma 6.4.1 inside Debian bookworm container
  • Google Kubernetes Engine

On my test instance with Puma 6.4.1, I ran kill -9 44, and puma: cluster worker 0 did not come back:

git@gitlab-webservice-default-78664bb757-2nxvh:/var/log/gitlab$ ps -ef
UID          PID    PPID  C STIME TTY          TIME CMD
git            1       0  0 Jan09 ?        00:01:39 puma 6.4.1 (tcp://0.0.0.0:8080) [gitlab-puma-worker]
git           23       1  0 Jan09 ?        00:05:46 /usr/local/bin/gitlab-logger /var/log/gitlab
git           41       1  0 Jan09 ?        00:01:55 ruby /srv/gitlab/bin/metrics-server
git           44       1  0 Jan09 ?        00:02:41 [ruby] <defunct>
git           46       1  0 Jan09 ?        00:02:38 puma: cluster worker 1: 1 [gitlab-puma-worker]
git           48       1  0 Jan09 ?        00:02:42 puma: cluster worker 2: 1 [gitlab-puma-worker]
git           49       1  0 Jan09 ?        00:02:41 puma: cluster worker 3: 1 [gitlab-puma-worker]
git         5205       0  0 21:57 pts/0    00:00:00 bash
git         5331    5205  0 22:00 pts/0    00:00:00 ps -ef

With Puma 6.4.0, that worked fine:

git@gitlab-webservice-default-78664bb757-97skg:/$ ps -ef
UID          PID    PPID  C STIME TTY          TIME CMD
git            1       0 71 22:06 ?        00:00:36 puma 6.4.0 (tcp://0.0.0.0:8080) [gitlab-puma-worker]
git           22       1  0 22:06 ?        00:00:00 /usr/local/bin/gitlab-logger /var/log/gitlab
git           36       0  0 22:07 pts/0    00:00:00 bash
git           65       1 22 22:07 ?        00:00:02 ruby /srv/gitlab/bin/metrics-server
git           68       1 22 22:07 ?        00:00:02 puma: cluster worker 0: 1 [gitlab-puma-worker]
git           70       1 22 22:07 ?        00:00:02 puma: cluster worker 1: 1 [gitlab-puma-worker]
git           72       1 22 22:07 ?        00:00:02 puma: cluster worker 2: 1 [gitlab-puma-worker]
git           74       1 22 22:07 ?        00:00:02 puma: cluster worker 3: 1 [gitlab-puma-worker]
git          148      36  0 22:07 pts/0    00:00:00 ps -ef
git@gitlab-webservice-default-78664bb757-97skg:/$ kill -9 68
git@gitlab-webservice-default-78664bb757-97skg:/$ ps -ef
UID          PID    PPID  C STIME TTY          TIME CMD
git            1       0 66 22:06 ?        00:00:36 puma 6.4.0 (tcp://0.0.0.0:8080) [gitlab-puma-worker]
git           22       1  0 22:06 ?        00:00:00 /usr/local/bin/gitlab-logger /var/log/gitlab
git           36       0  0 22:07 pts/0    00:00:00 bash
git           65       1 16 22:07 ?        00:00:02 ruby /srv/gitlab/bin/metrics-server
git           70       1 16 22:07 ?        00:00:02 puma: cluster worker 1: 1 [gitlab-puma-worker]
git           72       1 17 22:07 ?        00:00:02 puma: cluster worker 2: 1 [gitlab-puma-worker]
git           74       1 16 22:07 ?        00:00:02 puma: cluster worker 3: 1 [gitlab-puma-worker]
git          149       1 19 22:07 ?        00:00:00 puma: cluster worker 0: 1 [gitlab-puma-worker]
git          165      36  0 22:07 pts/0    00:00:00 ps -ef

I added debugging messages, and it seems that Process.wait2(-1, Process::WNOHANG) doesn't return anything when I ran kill <PID of worker>. The process is in the defunct state, so I'm a bit surprised that didn't work.

I applied this patch to get things working again:

diff --git a/lib/puma/cluster.rb b/lib/puma/cluster.rb
index 0d7c12bd..05d58445 100644
--- a/lib/puma/cluster.rb
+++ b/lib/puma/cluster.rb
@@ -562,7 +562,7 @@ module Puma
         begin
           # When `fork_worker` is enabled, some worker may not be direct children, but grand children.
           # Because of this they won't be reaped by `Process.wait2(-1)`, so we need to check them individually)
-          if reaped_children.delete(w.pid) || (@options[:fork_worker] && Process.wait(w.pid, Process::WNOHANG))
+          if reaped_children.delete(w.pid) || Process.wait(w.pid, Process::WNOHANG)
             true
           else
             w.term if w.term?

I should note that on Linux Docker, PID 1 seems to work fine:

#!/bin/env ruby

fork do
  loop { sleep 1 }
end

loop do
  puts Process.wait2(-1, Process::WNOHANG)
  sleep 1
end

My Dockerfile:

FROM ruby:3.1
COPY listen.rb .
RUN chmod +x listen.rb
ENTRYPOINT ["/listen.rb"]

If I run this container and forcibly kill the child:

% docker exec -it 3b53dc0dcbd5 bash
root@3b53dc0dcbd5:/# ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 23:36 pts/0    00:00:00 ruby /listen.rb
root         7     1  0 23:36 pts/0    00:00:00 ruby /listen.rb
root         8     0  0 23:36 pts/1    00:00:00 bash
root        14     8  0 23:36 pts/1    00:00:00 ps -ef
root@3b53dc0dcbd5:/# kill 7
root@3b53dc0dcbd5:/# %

I see:

7
pid 7 SIGTERM (signal 15)
/listen.rb:8:in `wait2': No child processes (Errno::ECHILD)
	from /listen.rb:8:in `block in <main>'
	from /listen.rb:7:in `loop'
	from /listen.rb:7:in `<main>'

Strange, this worked fine with Kubernetes. I repeated the test above with a Google Kubernetes Engine pod:

% kubectl run listen-test --image=registry.gitlab.com/stanhu/lfs-test/listen-test:latest
pod/listen-test created
% kubectl exec -it listen-test -- bash
root@listen-test:/# ps -ef
UID          PID    PPID  C STIME TTY          TIME CMD
root           1       0  0 05:44 ?        00:00:00 ruby /listen.rb
root           7       1  0 05:44 ?        00:00:00 ruby /listen.rb
root          21       0  0 05:46 pts/0    00:00:00 bash
root          27      21  0 05:46 pts/0    00:00:00 ps -ef
root@listen-test:/# kill 7
bash: kill: (7) - No such process
root@listen-test:/# command terminated with exit code 137

With kubectl logs -f listen-test running, I see:

7
pid 7 SIGTERM (signal 15)
/listen.rb:8:in `wait2': No child processes (Errno::ECHILD)
	from /listen.rb:8:in `block in <main>'
	from /listen.rb:7:in `loop'
	from /listen.rb:7:in `<main>'

I wonder what's different about Puma.

Still can't replicate this issue with a simple pod running Puma:

Dockerfile.puma

FROM ruby:3.1

RUN gem install puma:6.4.1
COPY hello.ru .

ENTRYPOINT ["puma", "hello.ru", "-w", "2"]

hello.ru

hdrs = {'Content-Type'.freeze => 'text/plain'.freeze}.freeze
body = ['Hello World'.freeze].freeze
run lambda { |env| [200, hdrs, body] }

Process.wait2(-1, Process::WNOHANG) is working fine in a pod as PID 1. I tried with a non-root user as well.

Ok, our application has its own process supervisor that spawns a Prometheus metrics Web server. If I disable that, for some reason Process.wait2(-1, Process::WNOHANG) works and reaps the processes properly.

Most likely we're trapping SIGCHLD and interfering with the wait.

Most likely we're trapping SIGCHLD and interfering with the wait.

I only skimmed the code you linked quickly (got hundreds of mail to catch up to today), but this sound weird.

I'm not sure why trapping SIGCHLD would make the wait fail. But I suppose at this stage it's best to try to come up with a smaller repro so we can better understand what's going on, and see what we could do to make this more resilient.

Yeah, I don't see any evidence we're trapping SIGCHLD, and I've tried to add signal handlers to see if that changes anything. I can't reproduce the problem yet.

I did notice that the Ruby implementation for Process.wait2 seems to mention SIGCHLD for some reason: https://github.com/ruby/ruby/blob/50c6cabadca44b7b034eae5dcc8017154a2858bd/process.c#L1343-L1348

Interesting, that was removed in 3.3: ruby/ruby#7527

Seems like it was converting a blocking waitpid into a non blocking one by waiting for SIGCHLD, sounds quite brittle to me, but I don't have the full context.

Ok, it looks like in Ruby 3.1 and 3.2 Process.detach(<some PID != 1>) appears to prevent Process.wait2(-1, Process::WNOHANG) from finding child processes when the parent PID is 1.

The problem doesn't happen in Ruby 3.3. I wonder if ruby/ruby#7476 or ruby/ruby#7527 fixed this.

I'll update the comments in #3314 in light of this, so I think that pull request is still a good idea.

Here's a sample reproduction:

Dockerfile

FROM ruby:3.2

COPY listen.rb .
COPY test.sh .

RUN chmod +x listen.rb
RUN chmod +x test.sh

ENTRYPOINT ["/listen.rb"]

listen.rb

#!/bin/env ruby

fork do
  loop { sleep 1 }
end

Process.spawn({}, "./test.sh", err: $stderr, out: $stdout, pgroup: true).tap do |pid|
  STDERR.puts "detaching PID #{pid}"
  Process.detach(pid)
end

loop do
  STDERR.puts Process.wait2(-1, Process::WNOHANG)
  sleep 1
end

test.sh

#!/bin/sh
sleep 600

It appears that Process.detach simply spawns a separate thread that does a blocking waitpid. I think this uses the SIGCHLD implementation introduced in ruby/ruby@054a412. This comment in the commit message is telling:

We also work to suppress false-positives from Process.wait(-1, Process::WNOHANG) to quiets warnings from spec/ruby/core/process/wait2_spec.rb with MJIT enabled.

This makes me think that this code is suppressing child PIDs when the parent is PID 1.

I've confirmed that this Ruby SIGCHLD business is responsible. I disabled WAITPID_USE_SIGCHLD in my Ruby 3.1.4 interpreter, and Process.wait(-1, Process::WNOHANG) started working again:

diff --git a/vm_core.h b/vm_core.h
index 1cc0659700..0e7d1643fe 100644
--- a/vm_core.h
+++ b/vm_core.h
@@ -126,7 +126,7 @@
 #endif
 
 /* define to 0 to test old code path */
-#define WAITPID_USE_SIGCHLD (RUBY_SIGCHLD || SIGCHLD_LOSSY)
+#define WAITPID_USE_SIGCHLD 0
 
 #if defined(SIGSEGV) && defined(HAVE_SIGALTSTACK) && defined(SA_SIGINFO) && !defined(__NetBSD__)
 #  define USE_SIGALTSTACK

It appears that only Process.detach is only needed; PID 1 is not relevant. This Ruby script will get stuck in Ruby 3.1.4 and 3.2.2, but exits immediately in Ruby 3.3.0:

#!/bin/env ruby

forked_pid = fork do
  loop { sleep 1 }
end

Process.spawn({}, "sh -c 'sleep 60'", err: $stderr, out: $stdout).tap do |pid|
  puts "detaching PID #{pid}"
  Process.detach(pid)
end

child_waiter = Thread.new do
  puts "Waiting for child process to die..."

  # This works 
  # puts Process.wait2(forked_pid)

  # This fails in Ruby 3.1 and 3.2
  puts Process.wait2(-1)
end

process_killer = Thread.new do
  puts "Killing #{forked_pid}"
  system("kill #{forked_pid}")
end

child_waiter.join
process_killer.join

@stanhu I just realised I might be experiencing deja-vu with this thing. I have a bunch of notes/links at https://github.com/dentarg/gists/tree/master/gists/ruby-bug-15499#ruby--puma-bug about "the ruby 2.6.0 wait bug", some comments:

Looks like #1741 implemented a workaround. Has something changed in Ruby yet again?

@dentarg Interesting! Given that ruby/ruby@054a412 was introduced in Ruby 2.6, I wonder if this broke Process.waitpid in a number of situations. With Ruby 3.3, that SIGCHLD implementation is gone, so I wonder if all these wait-related issues can be fixed without workarounds.

I see https://github.com/puma/puma/pull/1741/files#r266122715 mentions Process.waitpid(-1, Process::WNOHANG) was not working, and #3255 introduced this in Puma v6.4.0. This seems to work okay until you use Process.detach, but maybe there are more situations where it doesn't work.

I created https://bugs.ruby-lang.org/issues/20181, but I noticed https://bugs.ruby-lang.org/issues/19322 mentions this summary:

  1. Programs doing waitpid -1 are bad and wrong, nobody should ever do that, if any code in your program does this anywhere, then Ruby should no longer make any guarantees about subprocess management working correctly in the entire process.
  2. Programs doing waitpid -1 are widely deployed, it would be good if, when writing gems, there were APIs we could use which offer better isolation and composibility than the classic unix APIs, so that our gems work no matter what their containing processes are doing.
  3. Gems should never be spawning child processes anyway.

@dentarg Interesting! Given that ruby/ruby@054a412 was introduced in Ruby 2.6, I wonder if this broke Process.waitpid in a number of situations. With Ruby 3.3, that SIGCHLD implementation is gone, so I wonder if all these wait-related issues can be fixed without workarounds.

Found another thing that probably relates to this? Just wanted to connect the dots.

This bug was actually reported in https://bugs.ruby-lang.org/issues/19837 and fixed in the ruby_3_2 and ruby_3_1 stable branches, but there has yet to be a release with the fix.