Puma 6.2.1 - Connection drops with Mongo Ruby Driver / Mongoid

Question

Puma 6.2.1 - Connection drops with Mongo Ruby Driver / Mongoid

johnnyshields opened this issue a year ago · comments

Today we tried upgrading from Puma 5.6.5 to 6.2.1 and began seeing MongoDB database connection drops and reconnects in a loop.

The issue seems to be related to fork_worker, because removing fork_worker fixes the issue.

Here's a graph showing connection drops of our 3 db instances (1 primary, 2 secondary)

Our setup is pretty generic:

# puma.rb

WEB_CONCURRENCY = 12
RAILS_MAX_THREADS = 24

environment ENV.fetch('RAILS_ENV', 'development')

workers ENV.fetch('WEB_CONCURRENCY', 0).to_i

threads_count = ENV.fetch('RAILS_MAX_THREADS', 5).to_i
threads threads_count, threads_count

bind "tcp://#{ENV.fetch('HTTP_LISTEN_ADDRESS', '0.0.0.0')}:#{ENV.fetch('HTTP_LISTEN_PORT', '9292')}"

activate_control_app 'tcp://0.0.0.0:9293', no_token: true

raise_exception_on_sigterm false

fork_worker

prune_bundler

before_fork do
  Mongoid.disconnect_clients
end

if ENV.fetch('RAILS_ENV', 'development') == 'development'
  worker_timeout 3600
  pidfile ENV.fetch('PIDFILE', 'tmp/pids/server.pid')
  plugin :tmp_restart
end

Travis Bell · Answer 1 · Tue Apr 18 2023 09:11:11 GMT+0800 (China Standard Time)

Hey @johnnyshields, do you in fact not have the on_worker_boot callback mentioned in the Mongoid docs?

This setup seems to work (I've been testing enabling fork_worker this week):

on_worker_boot do
  Mongoid::Clients.clients.each do |name, client|
    client.close
    client.reconnect
  end
end

before_fork do
  Mongoid.disconnect_clients
end

Although after adding these callbacks the Mongo driver still complains about the new PID (which I would have thought the callbacks would prevent):

WARN -- : MONGODB | Detected PID change - Mongo client should have been reconnected (old pid 35395, new pid 35457

BUT, at least in this state, the connections are reconnected properly and the connection monitor stays up and is healthy. This is a key to what you could be seeing. In all of my testing this week (trying a bunch of different combinations to reconnect without the PID change warning), I've come across a few different ways that I can make the connection monitor stop monitoring the pool. I found this note in the docs that gave me a way to test it. Whenever my pool ended up in a NO-MONITORING state, my app behaved very strangely and had lots of issues making DB requests reliably.

Johnny Shields · Answer 2 · Tue Apr 18 2023 09:24:14 GMT+0800 (China Standard Time)

@travisbell thanks for the quick response. I do not have that, seems it was never an issue on Puma 5. I can reproduce in staging, so will try and let you know.

Johnny Shields · Answer 3 · Tue Apr 18 2023 10:15:01 GMT+0800 (China Standard Time)

@travisbell I added on_worker_boot and tried both preload_app! true and false, but nothing worked. Still seeing the connection errors.

Johnny Shields · Answer 4 · Tue Apr 18 2023 14:23:05 GMT+0800 (China Standard Time)

So an update on this. Firstly, it seemed prune_bundler was causing some issues (which it did not on Puma 5). Even after removing prune_bundler, we still see connection issues, whether or not preload_app! is enabled. However, preload_app! being enabled has far more connection errors than without.

Here's our config with which we tested and saw issues:

WEB_CONCURRENCY = 2
RAILS_MAX_THREADS = 5

workers ENV.fetch('WEB_CONCURRENCY', 0).to_i

threads_count = ENV.fetch('RAILS_MAX_THREADS', 5).to_i
threads threads_count, threads_count

bind "tcp://#{ENV.fetch('TABLESOLUTION_HTTP_LISTEN_ADDRESS', '0.0.0.0')}:#{ENV.fetch('TABLESOLUTION_HTTP_LISTEN_PORT', '9292')}"

raise_exception_on_sigterm false

fork_worker

on_worker_boot do
  if defined?(::Mongoid)
    ::Mongoid::Clients.clients.each do |_name, client|
      client.close
      client.reconnect
    end
  end
end

before_fork do
  ::Mongoid.disconnect_clients if defined?(::Mongoid)
end

Travis Bell · Answer 5 · Tue Apr 18 2023 22:21:51 GMT+0800 (China Standard Time)

In your original post you mention enabling fork_worker... so does the problem go away when it is disabled? I'm wondering what the simplest version of the config looks like where you start to notice the issue.

Johnny Shields · Answer 6 · Wed Apr 19 2023 09:03:13 GMT+0800 (China Standard Time)

Removing fork_worker seems to fix the issue.

Johnny Shields · Answer 7 · Wed Apr 26 2023 13:00:05 GMT+0800 (China Standard Time)

This issue was fixed by upgrading Mongo Ruby Driver to 2.18.2 (was on 2.17.3). I did not manage to track down the exact change in Mongo driver that fixed this... strange tho b/c Puma 5 gave me no issues.

Here's my final puma.rb that I went live with. (I'm using Kubernetes so don't need any of the hot reloading stuff.)

environment ENV.fetch('RAILS_ENV', 'development')

workers ENV.fetch('WEB_CONCURRENCY', 0).to_i

threads_count = ENV.fetch('RAILS_MAX_THREADS', 5).to_i
threads threads_count, threads_count

bind "tcp://#{ENV.fetch('HTTP_LISTEN_ADDRESS', '0.0.0.0')}:#{ENV.fetch('HTTP_LISTEN_PORT', '9292')}"

raise_exception_on_sigterm false

fork_worker

on_worker_boot do
  ::Mongoid::Clients.clients.each do |_name, client|
    client.close
    client.reconnect
  end
end

before_fork do
  ::Mongoid.disconnect_clients
end