elixir-ecto / db_connection

Database connection behaviour

Home Page:http://hexdocs.pm/db_connection/DBConnection.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Improve context for timeout errors logged

tpitale opened this issue · comments

We're seeing a randomly occurring error (does not correlate to slow queries or increased load on our database server):

Postgrex.Protocol (#PID<0.393.0>) disconnected: ** (DBConnection.ConnectionError) client #PID<0.23831.49> timed out because it queued and checked out the connection for longer than 30000ms

Which I believe comes from ConnectionPool's timeout handling. Unfortunately, there is no record of this in our slow query logs.

I'm not sure which direction to take when debugging further. If possible, could more information be added that might help direct us to the cause of this timeout? Perhaps some sort of backtrace or more information about the processes with these PIDs?

Is it possible that this log is a mistake? I see there is a check for the current holder:

Check that timeout refers to current holder (and not previous)

Is there a stacktrace for the error? Also, can you please include your DBConnection version?

Sorry! db_connection version 2.0.5.

No stacktrace has been logged. I think it's coming from: https://github.com/elixir-ecto/db_connection/blob/master/lib/db_connection/connection_pool.ex#L90

I'll try to wait for the next error to happen and see if Process.info() gives me more info on the PID, if it's alive.

@tpitale yup, adding a Process.info is exactly what I had in mind. But we would do it inside DBConnection. Something like this:

  message = "client #{inspect pid} timed out because it checked out " <>
        "the connection for longer than #{timeout}ms"

  case Process.info(pid, :current_stacktrace) do
    {:current_stacktrace, stacktrace} ->
      message <> "\n\n#{inspect pid} was at location:\n\n" <>
        Exception.format_stacktrace(stacktrace)
    _ ->
      message
  end

Can you please send a PR?

@josevalim Would you like something similar in the timeout behavior in Connection, too? Or should I do that as its own PR?

@tpitale I think doing so in the Connection module would only point to DBConnection internals? So I am not sure if that is useful. So let's go in steps. Does the one proposed above provide enough information to fix the issue? :)

Oh, I misunderstood your question. Yes, both places in the PR look good! 👍