puma / puma

A Ruby/Rack web server built for parallelism

Home Page:https://puma.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Puma workers are freezing

m3nd3s opened this issue · comments

Describe the bug

Let's start with this picture:

image

I have a Rails application hosted at AWS using ECS (Elastic Container Service) - docker. The Puma was configured to work with 3 workers and each worker with 8 threads.

For some reason sometimes the workers are freezing, they became unresponsive for something like 2min 30seconds until the puma detects the timeout and sends the kill signals to stop all the workers.

So I would like to ask for help to identify what can cause this kind of problem? Is there any way to debug it?

There is another behavior, I'm not sure if the puma stops receiving the requests or if it still keeps receiving the requests, but after the puma timeout and the signal to kill, there are a lot (hundreds) of requests failing on Nginx (Puma is behind of Nginx) with 429 HTTP status code.

Puma config:

The puma is configured with 3 workers, 8 threads (min and max)

To Reproduce

I don't know how to reproduce, actually, identifying how to reproduce it can be very helpful.

Expected behavior

Desktop (please complete the following information):

  • OS: Linux (via docker)
  • Puma Version: 5.5.2

after the puma timeout and the signal to kill, there are a lot (hundreds) of requests failing on Nginx (Puma is behind of Nginx) with 429 HTTP status code.
That sounds like the requests in the socket backlog, see https://github.com/puma/puma/blob/4ac14482f1eda4bcf2d2baa3a379afe3f5b55a9c/docs/architecture.md

Thanks for helping.

Actually, the Nginx is configured to communicate with Puma via the TCP connection:

upstream app {
  server app:3000;
}

You still have a backlog

Not sure if it helps, but I was playing around with puma for testing purposes and also bumped into this. What seemed to have helped (quite a lot) in my case was to clear the tmp/cache folder.

@gingerlime Maybe, I'll take a look.

Is there any way to verify if a puma worker is stuck or frozen?

Is there any way to verify if a puma worker is stuck or frozen?

...ask it to check in via a pipe every 60 seconds? 😆

If it's not checking in via the checkpipe, then something is pretty wrong. It's also interesting that all 3 workers lock up at the same time.

So what resources do all three of those workers share?

  1. CPU
  2. Memory (though this one seems unlikely to me, if you were out of of memory the OOM killer would probably just blast them first)
  3. Something application specific? It would need to hold the global VM lock, otherwise our checkin would succeed.

In any case, probably not a puma issue but something with your application or setup. Good luck!