Problem with servers connection from odyssey through HAproxy
skilyazhnev opened this issue · comments
Hi! First of all thanks for this product.
Second.
We have installation with Odyssey <---> HAproxy <---> PostgreSQL.
After switch app to odyssey we recived a lot of strange errors like
T13:02:05Z info [cb47523521fcb sb7656329d4c1] (main) server disconnected (read/write error): Broken pipe, status OD_ESERVER_WRITE
and
T13:15:05Z info [c56515b274f34 sa0a3bb0e2c20] (main) server disconnected (read/write error): Resource temporarily unavailable, status OD_ESERVER_READ
which broke query flow and crush our app.
And we found problem in HAproxy configuration which affect to Odyssey and create (from my point of view) strang Odyssey behavior.
If we have HAproxy client\server timeout
defaults
mode tcp
log global
retries 2
timeout queue 5s
timeout connect 20s
timeout client 60m
timeout server 60m
timeout check 20s
After 60 minut HAproxy send tcp-fin to Odyssey and it... must close connetion? or handel that somehow?
In pgbconsole i see that Odyssey just forget where it must sent packages (addr, port), but it doesn't close "logical" server connection.
And after then client try use connect to server connection and fallen.
Red squares in the place where the connections just disappeared
We tryed to user pool_ttl, but we still catch this situation. But more complexive.
database "database" {
user "postgres" {
authentication "clear_text"
password "password"
storage "storage_name.database"
storage_db "postgres"
storage_user "postgres"
storage_password "password"
pool "session"
pool_size 60
pool_timeout 0
pool_ttl 60
pool_cancel yes
pool_rollback yes
pool_discard yes
client_fwd_error yes
log_debug no
}
}
If we have one "broken" connection in pool, and amount connection is 5.
And client sometimes use one connection from pool, "broken" connection didn't closed and it can broke query flow When the load rises.
Looks like pool_ttl it's timer for whole pool and not for separate server connection.
In general, so far we have not found an iron solution for this problem and I have not found similar cases in issues.