yandex / odyssey

Scalable PostgreSQL connection pooler

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Problem with servers connection from odyssey through HAproxy

skilyazhnev opened this issue · comments

Hi! First of all thanks for this product.

Second.

We have installation with Odyssey <---> HAproxy <---> PostgreSQL.
After switch app to odyssey we recived a lot of strange errors like

T13:02:05Z info [cb47523521fcb sb7656329d4c1] (main) server disconnected (read/write error): Broken pipe, status OD_ESERVER_WRITE

and

T13:15:05Z info [c56515b274f34 sa0a3bb0e2c20] (main) server disconnected (read/write error): Resource temporarily unavailable, status OD_ESERVER_READ

which broke query flow and crush our app.

And we found problem in HAproxy configuration which affect to Odyssey and create (from my point of view) strang Odyssey behavior.

If we have HAproxy client\server timeout

defaults
    mode                    tcp
    log                     global
    retries                 2
    timeout queue           5s
    timeout connect         20s
    timeout client          60m
    timeout server          60m
    timeout check           20s

After 60 minut HAproxy send tcp-fin to Odyssey and it... must close connetion? or handel that somehow?
In pgbconsole i see that Odyssey just forget where it must sent packages (addr, port), but it doesn't close "logical" server connection.
And after then client try use connect to server connection and fallen.

Red squares in the place where the connections just disappeared
image

We tryed to user pool_ttl, but we still catch this situation. But more complexive.

database "database" {
        user "postgres" {
                authentication "clear_text"
                password "password"
                storage "storage_name.database"
                storage_db "postgres"
                storage_user "postgres"
                storage_password "password"
                pool "session"
                pool_size 60
                pool_timeout 0
                pool_ttl 60
                pool_cancel yes
                pool_rollback yes
                pool_discard yes
                client_fwd_error yes
                log_debug no
        }
    }

If we have one "broken" connection in pool, and amount connection is 5.
And client sometimes use one connection from pool, "broken" connection didn't closed and it can broke query flow When the load rises.

Looks like pool_ttl it's timer for whole pool and not for separate server connection.

In general, so far we have not found an iron solution for this problem and I have not found similar cases in issues.