Identify Optimum PostGres settings for launch

Question

Identify Optimum PostGres settings for launch

dennyabrain opened this issue 5 months ago · comments

One of the bottlenecks in the infra is how many concurrent writes can our database support.
One of the scenarios we want to be prepared for is getting ten lakh messages over an hour. We have strategies in place to scale (vertically or horizontally) our web server. That means these web servers would open connections to our Postgres instance. The client library for dealing with databases (Ecto) used in the web server has support for managing connection pools.
Scope of this feature is to find out

For a given amount of RAM how many simultaneus connections can a Postgres server handle
Best practices of connection pooling to be configured in the Ecto (database ORM used in the web server)
Ideal combination of Postgres, web server count and connection pool size to handle our usage scenario

Denny George · Answer 1 · Mon Feb 19 2024 12:16:17 GMT+0800 (China Standard Time)

source

With Postgres, each new connection can take upto 1.3MB in memory

Denny George · Answer 2 · Mon Feb 19 2024 12:20:33 GMT+0800 (China Standard Time)

One thing that works in our favour is that we need this high throughput only for one specific write operation. This is the operation to write incoming whatsapp messages into the database. So we don't have to worry about any update collision related issues.

Aurora · Answer 3 · Tue Feb 20 2024 15:29:20 GMT+0800 (China Standard Time)

Tradeoffs for client-side pooling (e.g. Ecto) vs. middleware (external) pooling (e.g. pgBouncer, pgPool)

https://elixirforum.com/t/ectos-connection-pooling-vs-using-pgbouncer/33816/2

My understanding is that using something like PgBouncer can help you in situations in which for some reason you have to create/remove database connections in a fast way or you want to have more database connections than your database instance actually is supposed to support.

One example that I came personally across was when I was using elixir together with kubernetes. During some of our operations, we would spawn many kubernetes pods at the same time, and all of them would try to get Database connections together. In those times, it would be easy to go over the number database connections that our Postgres instance actually supported.

https://stackoverflow.com/questions/71352508/what-are-the-pros-and-cons-of-client-side-connect-pools-vs-external-connection-p

From my experience and understanding, the disadvantages of an external pool are:
- additional failure point (including from a security standpoint)
- additional latency
- additional complexity in deployment
- security complications w/ user credentials

Usually, a connection pool on the application side is a good thing for the reasons you detail. An external connection pool only makes sense if
- your application server does not have a connection pool
- you have several (many) instances of the application server, so that you cannot effectively limit the number of database connections with a connection pool in the application server

Other relevant info

https://hexdocs.pm/ecto/replicas-and-dynamic-repositories.html

When applications reach a certain scale, a single database may not be enough to sustain the required throughput. In such scenarios, it is very common to introduce read replicas: all write operations are sent to the primary database and most of the read operations are performed against the replicas. The credentials of the primary and replica databases are typically known upfront by the time the code is compiled.

https://hexdocs.pm/ecto/Ecto.Repo.html

:pool_size - the size of the pool used by the connection module. Defaults to 10

Aurora · Answer 4 · Tue Feb 20 2024 21:12:09 GMT+0800 (China Standard Time)

Max number of connections postgres can support

Not updated - https://wiki.postgresql.org/wiki/Number_Of_Database_Connections

You can often support more concurrent users by reducing the number of database connections and using some form of connection pooling

(See section) How to Find the Optimal Database Connection Pool Size

https://stackoverflow.com/questions/30778015/how-to-increase-the-max-connections-in-postgres

Aurora · Answer 5 · Wed Feb 21 2024 13:50:49 GMT+0800 (China Standard Time)

Load testing postgres on EC2

Denny George · Answer 6 · Fri Feb 23 2024 14:04:36 GMT+0800 (China Standard Time)

Instructions on how to run pgbench for profiling insert operations - https://github.com/tattle-made/feluda/wiki/Optimization#testing-insert-performance