puma / puma

Describe the bug
I'm using puma 5.6.4 with ruby 2.7.5 on kubernetes with 8 workers per pod and 1 thread per worker. I'm consistently seeing a high backlog on some of my workers, while some workers will have backlogs close to zero, I've seen some others jump up as high as 45. In one case I had a pod of 8 workers, 6 of which were idle, and the other two workers had a backlog of 17.

I understand that the reason the workers are getting into a high backlog is because they have requests that are taking a long time to fill and since they are single threaded the other requests pile up in the todo array. I can even confirm with my logs that those particular requests on those particular pods are taking a really long time.

My main two questions are why are the requests getting backed up on a busy worker, and what can I do about it?

Puma config:

preload_app!
workers(8)
queue_requests(true) 
threads(1,1)
worker_boot_timeout("120")
activate_control_app  ('tcp://127.0.0.1:9293'), { auth_token: 'foo' }
persistent_timeout("131") 
bind("tcp://0.0.0.0:8888")

before_fork do
  if defined?(ActiveRecord::Base)
    ActiveRecord::Base.connection.disconnect!
  end

  if defined?(Redis)
    if Redis.current
      Redis.current.disconnect!
    end
  end
end

on_worker_boot do
  if defined?(ActiveRecord)
    ActiveRecord::Base.establish_connection
  end
end

Command line:

bundle exec puma -C puma.rb

To Reproduce

Unfortunately I am unable to reproduce the problem with a trivial example. The behavior I'm see running on my MacBook is different than what I see on production. In production I see active workers processing a very long duration request while building a growing backlog.

On my MacBook I used the sleep.ru rack up file with a single worker with a single thread. I sent the first request with a 60 second sleep value and the request would hang for 60 seconds. If I submitted another request during that sleep it would get stuck behind it, but it would never make it into the @todo array and the backlog was reportedly zero.

I believe the reason why is on the Puma::Server class handle_server method it calls pool.wait_until_not_full(

puma/lib/puma/server.rb

Line 327 in 8763c17

pool.wait_until_not_full

) and will just hang until there is an available thread. Since the request is still sitting on the socket and hasn't been pushed to the @todo array, the backlog will not increment even though the end user is still waiting for the request to get processed. This is consistent with my experiments on my laptop and what's written in the documentation.

This is the opposite of what I see in production as the backlog is very high despite having a single thread.

So coming back to my original questions why are workers adding requests to the @todo array when they are already busy, and what I can do to mitigate the impact this has on my end users?

Desktop (please complete the following information):

OS: Mac
Puma Version: 5.6.4

Have you tried wait_for_less_busy_worker?

Attempts to route traffic to less-busy workers by causing them to delay listening on the socket, allowing workers which are not processing any requests to pick up new requests first.
Only works on MRI. For all other interpreters, this setting does nothing.

wait_for_less_busy_worker should have no effect due to wait_until_not_full.

Can you confirm via log output in production that you're definitely running 1 thread?

Here is a snippet from the STDOUT of the production pod's boot up.

 * Starting metrics server on tcp://0.0.0.0:9294
 Puma starting in cluster mode...
 * Puma version: 5.6.4 (ruby 2.7.5-p203) ("Birdie's Version")
 *  Min threads: 1
 *  Max threads: 1
 *  Environment: production
 *   Master PID: 1
 *      Workers: 8
 *     Restarts: (✔) hot (✖) phased
 * Preloading application
 Top level ::CompositeIO is deprecated, require 'multipart/post' and use `Multipart::Post::CompositeReadIO` instead!
 Top level ::Parts is deprecated, require 'multipart/post' and use `Multipart::Post::Parts` instead!
  * Listening on http://0.0.0.0:8888

Thanks, just wanted to rule that out. And also, since you didn't say it explicitly: when you say "seeing a high backlog", you mean you're seeing a high backlog value in Puma.stats right?

You are correct. I'm not currently running the puma control server in production, but I am using the puma-metrics plugin that converts the stats data to a prometheus-compatible format. Below is not one of most extreme examples that happens to us, but it's what I could grab right now. FYI the highest one of our worker's puma_backlog has been in the last 24 hours, which was a Sunday, was 16.

You'll notice that worker index 0 and 6 have a puma_backlog of 5 and 6 despite the fact that workers index 1, 3, and 5 show a puma_pool_capacity of 1 meaning they are idle. Hence my raising this issue. Why are they in a busy workers backlog when they should be going to an available worker?

# HELP puma_backlog Number of established but unaccepted connections in the backlog
puma_backlog{index="0"} 5.0
puma_backlog{index="1"} 0.0
puma_backlog{index="2"} 2.0
puma_backlog{index="3"} 0.0
puma_backlog{index="4"} 0.0
puma_backlog{index="5"} 0.0
puma_backlog{index="6"} 6.0
puma_backlog{index="7"} 2.0
# TYPE puma_running gauge
# HELP puma_running Number of running worker threads
puma_running{index="0"} 1.0
puma_running{index="1"} 1.0
puma_running{index="2"} 1.0
puma_running{index="3"} 1.0
puma_running{index="4"} 1.0
puma_running{index="5"} 1.0
puma_running{index="6"} 1.0
puma_running{index="7"} 1.0
# TYPE puma_pool_capacity gauge
# HELP puma_pool_capacity Number of allocatable worker threads
puma_pool_capacity{index="0"} 0.0
puma_pool_capacity{index="1"} 1.0
puma_pool_capacity{index="2"} 0.0
puma_pool_capacity{index="3"} 1.0
puma_pool_capacity{index="4"} 0.0
puma_pool_capacity{index="5"} 1.0
puma_pool_capacity{index="6"} 0.0
puma_pool_capacity{index="7"} 0.0
# TYPE puma_max_threads gauge
# HELP puma_max_threads Maximum number of worker threads
puma_max_threads{index="0"} 1.0
puma_max_threads{index="1"} 1.0
puma_max_threads{index="2"} 1.0
puma_max_threads{index="3"} 1.0
puma_max_threads{index="4"} 1.0
puma_max_threads{index="5"} 1.0
puma_max_threads{index="6"} 1.0
puma_max_threads{index="7"} 1.0
# TYPE puma_requests_count gauge
# HELP puma_requests_count Number of processed requests
puma_requests_count{index="0"} 41219.0
puma_requests_count{index="1"} 41672.0
puma_requests_count{index="2"} 41774.0
puma_requests_count{index="3"} 41420.0
puma_requests_count{index="4"} 44358.0
puma_requests_count{index="5"} 43267.0
puma_requests_count{index="6"} 40933.0
puma_requests_count{index="7"} 40496.0
# TYPE puma_workers gauge
# HELP puma_workers Number of configured workers
puma_workers 8.0
# TYPE puma_booted_workers gauge
# HELP puma_booted_workers Number of booted workers
puma_booted_workers 8.0
# TYPE puma_old_workers gauge
# HELP puma_old_workers Number of old workers
puma_old_workers 0.0

why are workers adding requests to the @todo array when they are already busy

Workers accept sockets (wrapped in Puma by the Client class), but a socket may have several requests, as many sockets are 'keep-alive'. Once a socket is accepted by a worker, there is logic to determine whether to close it, mostly based on whether the whole server is busy and the number of requests received from the socket. Otherwise, if the socket keeps sending requests quickly, several will be processed on the same worker.

So, given the above, if a socket is accepted on a given worker, quite often all its requests will be handled by that worker.

Also, from the above log:
'HELP puma_backlog Number of established but unaccepted connections in the backlog'

I don't think that's correct, or maybe it's misleading. The worker backlog is the number of sockets waiting to be processed. The phrase implies that it is the number of sockets waiting in the listening server socket waiting to be accepted, which would be the same for every worker, as they're all 'listening' to the same server socket...

Lastly, from the above log, the 'puma_requests_count' varies by ~10%. That seems like pretty even distribution.

We ultimately set queue_requests(false) and saw a huge performance improvement. There are still the occasional requests that get into the backlog for some reason but it's quite small.

So, given the above, if a socket is accepted on a given worker, quite often all its requests will be handled by that worker.

This makes sense... d'oh! Wasn't sure why I didn't think of multiple requests on the same connection.

I'm having the a similar issue. queue_requests when set to true adds around 35% (100ms) more queue time to requests according with new relic. Setting it to false will make the request queue time mostly 0, although by disabling it from time to time we get 504 errors from the AWS application load balancer when puma refuses the connection.

Using Puma 6.1.1 and ruby 3.2.1. I'm running puma inside docker, with similar config 1 thread per worker and 4 workers.

Some metrics:

Using a thread profiler:

@basex Read the discussion in #3071 and pay attention to the replies from Nate

@dentarg I'm measuring the queue time by setting a header in nginx that passes the request to a load balancer and then to puma.

High worker backlog