yandex / odyssey

Scalable PostgreSQL connection pooler

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Feature: Listen/Notify Support with Transaction Pooling

merarischroeder opened this issue · comments

Problem

  • For situations where more than 200 separate client programs need to wait for a PostgreSQL Notify (NotificationResponse) message.
  • Each program might use a different username/password to login to the PostgreSQL server
  • The use of NOTIFY/LISTEN doesn't scale with PostgreSQL. PostgreSQL doesn't tend to work well with any more than 200 concurrent connections.
  • The NOTIFY/LISTEN mechansim isn't aggregated by Odyssey.
  • NOTIFY/LISTEN doesn't work with Transaction Pooling, only with Session Pooling
  • NOTIFY/LISTEN is quite useful for SIGNALLING (not for queued data), so a NOTIFIED process knows to Query again (instead of polling on an interval).
  • I would like to use just PostgreSQL and not have to add more infrastructure

A diagram of the current problem:
Current-Desired drawio(1)

Notify (and Listen) doesn't work for Transaction Pooling. This is because the NotificationResponse can only be sent back on the connection that requested LISTEN. So if there are 5 (or 200) PostgreSQL Clients on different processes that individually request LISTEN on different topics, the connection must remain open. This means Odyssey is not effectively able to optimise the connection-efficiency, it must also connect to PostgreSQL server as many times.

Inadequate workarounds

  • Session Mode doesn't scale. If there are 1000x Odyssey clients there needs to be 1000x PostgreSQL connections (no pooling).
  • If other infrastructure is used, there needs to be a connection-per-client maintained anyway, and when SIGNALLED there will be a maximum of 2x Connections per client (one for PosgreSQL and one for some other Signalling system)
  • Some PostgreSQL installations don't have integration with other infrastructure. That is, if you planned to call NOTIFY in a trigger, there may be no alternative when using other infrastructure (with limitations from the IaaS provider).

Solution

  • Maintain a single [Listen Connection] to the PostgreSQL server
  • Isolate LISTEN commands from connections in Transaction Pooling mode
  • Read the LISTEN command and map it against the Odyssey TCP connection - using a simple PUB/SUB mechanism
  • Send each distinct LISTEN command onward to the [Listen Connection] instead
  • Upon NOTIFY from the [Listen Connection] convert to an internal PUB message and distribute to subscribers (forwarding the original NOTIFY message).

PGBounce PubSub drawio(1)

Each PostgreSQL Client is configured to run as a Transaction Pooled connection. When a command is running, a new PostgreSQL connection is used from a pool of connections as usual. When the Client is idle, the pool connection to PostgreSQL is closed as usual, but the [Listen Connection] remains open. If a NOTIFY comes through the [Listen Connection] it can be delivered to the PostgreSQL Client.

Outcome and Usage

DesiredOutcome drawio(1)

Instead of having 1:1, one PostgreSQL Connection for every Waiting Client connection, it's now *:1, where Odyssey handles multiple async connections and when they are idle, all NOTIFY messages are handled through a single shared [Listen Connection].

Protocol references

https://www.postgresql.org/docs/9.3/sql-listen.html

There is only one standard command to listen: LISTEN. There are no pg_ functions to accomplish the same.

https://www.postgresql.org/docs/9.3/sql-notify.html

There are two standard commands to notify: NOTIFY and pg_notify.

The queue is quite large (8GB in a standard installation) and should be sufficiently sized for almost every use case. However, no cleanup can take place if a session executes LISTEN and then enters a transaction for a very long time.

Odyssey will have a [Listen Connection] which is never within a transaction. Therefore, this issue is mitigated. However, Odyssey SHOULD only deliver to Odyssey-clients after they are detected to have completed a transaction. It should be possible to configure Odyssey for "immediate delivery", many PostgreSQL client libraries probably already support async Notify signalling. The Rust tokio one does for example, while the popular C# one doesn't. All PostgreSQL libraries should be programmed to support async delivery, and should enable a Latch implementation (either internally or external to the core library).

https://www.postgresql.org/docs/9.3/protocol-flow.html#PROTOCOL-ASYNC

48.2.6... If a frontend issues a LISTEN command, then the backend will send a NotificationResponse message (not to be confused with NoticeResponse!) whenever a NOTIFY command is executed for the same channel name.

https://www.postgresql.org/docs/10/protocol-flow.html#id-1.10.5.7.4

52.2.2. Simple Query...A simple query cycle is initiated by the frontend sending a Query message to the backend.

https://www.postgresql.org/docs/10/protocol-message-formats.html

  • Query from frontend: Byte("Q") + Int32(Length of total message) + Bytes(Null-terminated C String)
  • NotificationResponse from backend: Byte("A") + Int32(Length of total message)
    -- The process ID of the notifying backendprocess (Int32)
    -- The name of the channel that the notify has been raised on (Null-terminated C String)
    -- The optional payload string passed from the notifying process (Null-terminated C string)

(https://www.postgresql.org/docs/10/protocol-message-types.html)

Process for a single Odyssey client

Listen-Notify-Sequence4 drawio(1)

When a LISTEN command is issued from the client:

  • Odyssey registers any client connections against the topic within the Listen commands within SimpleQuery messages
  • Odyssey maintains the [Listen Connection] and forwards the first Listen command it receives from a client
  • Odyssey receives NotificationResponse messages, maps them to the correct registered connections and sends a copy to the correct Odyssey-connected clients.
  • Those clients might then react to such an async Notify message. They might SQL query to see if there is new data to process.

Important Source Code Information for SimpleQuery

  • KIWI_FE_QUERY relates to the SimpleQuery message ('Q')
  • KIWI_FE_QUERY is checked in frontend.c : 639. This goes to od_console_query in console.c : 1644. This will log the query if configured to do so [:1667]. The command is then parsed, and several commands are supported for special treatment. This list is found at console.c:34 in od_console_keywords { kill_client, reload, show, stats, servers, clients. lists. set, pools, pools_extended, databases, create, module, errors, errors_per_route, frontend, router, drop, version, listen }. listen is an interesting inclusion, and related to OD_LLISTEN
  • OD_LLISTEN is used for reading the config for the server port for Odyssey to listen to. Not relevant.
  • OD_LLISTEN is also used within od_console_show likely as a special SQL command to tell the client what port that server is listening on. This might clash when we are trying to detect the legitimate LISTEN keyword. od_console_show is only called after finding the SHOW token (OD_LSHOW). eg. SHOW LISTEN;. This means there SHOULD NOT be a clash.
  • (CREATE MODULE seems to be a supported path. This seems to load modules dynamically by file path of Odyssey)
  • Therefore, console.c is where a new CASE option should be added to process when the first token is LISTEN. This is in the od_console_query function. A new function will need to then be called which will tokenise the channel name.

Important Source Code Information for NotificationResponse

  • KIWI_BE_NOTIFICATION_RESPONSE relates to the NotificationResponse message ('A')
  • KIWI_BE_NOTIFICATION_RESPONSE is currently not enumerated anywhere in code
  • backend.c does have some Backend Message parsing during startup od_backend_startup, but that's not relevant for NotificationResponse.
  • There is a separate od_backend_ready_wait function though. That seems to be checked within a normal query/response cycle in od_backend_query though. So probably not relevant.
  • I tried to trace a NoticeResponse but apparently these are not identified and processed. However I am quite sure that they would be forwarded.
  • od_relay_process seems to arbitrarily forward packets (presumbaly from BE to FE) in relay.h. This is called from od_relay_pipeline, from od_relay_step, and finally by od_frontend_remote in frontend.c:954
  • Looking at frontend.c:969+970, it would seem that od_relay_attach is called twice, once for each direction (FE to BE and BE to FE). Although all that function seems to do is assign. There are already two relay objects, and od_relay_attach simply does relay->dst = dst. (Along with an assert line). Interestingly, these assignments are done AFTER the call to od_relay_start. But the od_relay_step is called after, so that's probably when it actually starts, in od_relay_step.
  • Therefore, od_relay_process seems to be the best place to intercept KIWI_BE_NOTIFICATION_RESPONSE from the backend. This is where to hook in and implement our pub/sub helper to enable Transaction-Mode NOTIFY/LISTEN capabilities.

Other considerations

  • How does Odyssey detect if a connection is still within a transaction? Is that implied by Odyssey itself? Or does Odyssey simply poll PostgreSQL to determine this? The latter would seem more resilient.
  • If a connection to Odyssey is Session-Mode (not Transaction-mode), then there MUST NOT be any LISTEN/NOTIFY optimisations. The Session-Mode connection must operate as per normal.
  • In order to support 1000x different db-users that connect occasionally: Does Odyssey have an aggressive mode where connections are closed when not used, instead of being returned to a pool?
  • In order to support 1000x different db-users that connect occasionally: Are different Usernames connections given a separate user-connection-pool? I would prefer to have a single simple global-connection-pool, and then have Odyssey run commands to change the user context. (Although I don't think that's possible with PostgreSQL, not in a way that prevents the user switching back to the original login) The wire messages don't seem to support it; nor do the SQL commands. However, it's possible that PostgreSQL does support additional Startup messages. In any case, idle pool connections should be closed if the MAX_CONNECTION limit is being reached.
  • We would only support the LISTEN command, and not any other indirect way of registering for LISTEN (such as a database stored function).
  • We will assume that LISTEN channels are case-sensitive. (From my experience this is correct).
  • Odyssey doesn't have a donation point or specified bounty marketplace. I will likely use a 3rd party bounty marketplace or Upwork to pay for this; or I will have my team build it.

Hi!
We don't have a donation system, but would happily accept the Pull Request. Also, I can help with figuring out what to change.
AFAIK Listen command goes as a usual SimpleQuery packet. To understand that that query is issuing LISTEN we have to parse the query, which does not sound scalable...

BTW in Odyssey, you can set up session pooling for a single user. This user might send LISTEN commands, while other users are in transaction pooling mode. Does this help?

@x4m Awesome! Thanks for your suggestions. I will dig a bit deeper now to specify the design.

BTW in Odyssey, you can set up session pooling for a single user. This user might send LISTEN commands, while other users are in transaction pooling mode. Does this help?

Not for the problem being solved here. I have updated the original post to clarify that a lot more: "For situations where more than 200 separate client programs need to wait for a PostgreSQL Notify (NotificationResponse) message.". If there are 200 separate programs, there needs to be 200 session-mode connections.

My proposal is that they connect with transaction-mode once each. This connection can await a NOTIFY message, then when that is received, they can use the same connection to send a SQL query.

I have updated the Original-Post with a lot more details

I'll try to answer as many questions as possible:

How does Odyssey detect if a connection is still within a transaction?

From time to time Odyssey receives ReadyForQuery message from the server. This message contains transaction boolean property. If we are not in a transaction, then from the end of ReadyForQuery message server connection is detached and can be used by another client. A single byte from the client attaches to the server connection.

If a connection to Odyssey is Session-Mode (not Transaction-mode), then there MUST NOT be any LISTEN/NOTIFY optimizations. The Session-Mode connection must operate as per normal.

Session connections are attached to the server from the first byte from the client after authentication. Session connections are never detached from the server.

In order to support 1000x different db-users that connect occasionally: Does Odyssey have an aggressive mode where connections are closed when not used, instead of being returned to a pool?

Connections are returned to the pool when not in transaction. Also, there's a server timeout for "idle in transaction". Unused connections from a pool are closed after pool_ttl milliseconds.

In order to support 1000x different db-users that connect occasionally: Are different Usernames connections given a separate user-connection-pool?
Yes.
I would prefer to have a single simple global-connection-pool, and then have Odyssey run commands to change the user context. (Although I don't think that's possible with PostgreSQL, not in a way that prevents the user switching back to the original login) The wire messages don't seem to support it; nor do the SQL commands. However, it's possible that PostgreSQL does support additional Startup messages. In any case, idle pool connections should be closed if the MAX_CONNECTION limit is being reached.

You really want to make the server run far below max_connections. TPS throughput will be much higher. But it would require very hard patching of Postgres to change user. E.g. user can be authenticated by TSL cert, in this case, you need to restart the encryption. It's more than half of a cost of just creating a new server connection.

We would only support the LISTEN command, and not any other indirect way of registering for LISTEN (such as a database stored function).

What about comments in front of LISTEN keyword? What if the query is just not a valid query? ("LISTENLISTENLISTEN sadgfhsdfg;")
If you implement any kind of query parsing - please make if off by default. We really do not want to incur any perf penalty on existing users. Odyssey tries to pack as many small packets into one network packet as possible. Parsing, possibly, will break this.
Also, there is an extended protocol. I'm not sure it works with LISTEN ("LISTEN '$1").

Odyssey doesn't have a donation point or specified bounty marketplace. I will likely use a 3rd party bounty marketplace or Upwork to pay for this; or I will have my team build it.

Well, we have a team of Ural Federal University students working on our projects. E.g. they are kind of doing good in WAL-G. Probably someone of them could undertake this project. If you wish I'll try to summon them (I hope this is the correct English idiom, I don't mean any dark rituals). Surely, if the team behind the project is the very same folks who will use the feature - it would be best.

Thanks @x4m. Do you have a Gitter or something as well that you use for discussion?

Thanks for the answers, they are really helpful.

You really want to make the server run far below max_connections. TPS throughput will be much higher. But it would require very hard patching of Postgres to change user. E.g. user can be authenticated by TSL cert, in this case, you need to restart the encryption. It's more than half of a cost of just creating a new server connection.

I'm working on it. I can't link you to a PostgreSQL Issue though, because they only use mailing lists at the moment. see https://docs.google.com/document/d/1u6mVKEHfKtR80UrMLNYrp5D6cCSW1_arcTaZ9HcAKlw/edit?usp=sharing - where I am summarising everything. I will add in a TODO for more complex authentication mechanisms.

What about comments in front of LISTEN keyword? What if the query is just not a valid query? ("LISTENLISTENLISTEN sadgfhsdfg;")
If you implement any kind of query parsing - please make if off by default. We really do not want to incur any perf penalty on existing users. Odyssey tries to pack as many small packets into one network packet as possible. Parsing, possibly, will break this.
Also, there is an extended protocol. I'm not sure it works with LISTEN ("LISTEN '$1").

Absolutely - parsing will be off by default. The implementation will be quite strict an discriminating to begin with. The SimpleQuery will need to "begin" with "LISTEN ", and will not support parameters.

Ideally, we would additionally add a custom PQ message for this for that reason. That could be enabled by default, and the clients might implement it. (SQL should really be Binary-SQL, but that's another matter).

I don't think parsing will break packet packing. Each message still arrives from the client in logical order. The parsing will occur then - on incoming. The default is that the message is still forwarded as usual. But additionally, it will be configured on the [Listen Connection].

Well, we have a team of Ural Federal University students working on our projects.

That sounds promising. That would handle the "Staffing" part. I will probably offer a cash bounty, and leave it publically open, those students would be in a good position to win the bounty.