elixir-ecto / db_connection

Database connection behaviour

Home Page:http://hexdocs.pm/db_connection/DBConnection.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add ability for lazy database connections pooling

bartekupartek opened this issue · comments

Last year with Ecto 3.1.0 it was released dynamic Repo which allowed me to connect successfully over 80 multi tenant applications to databases inside single (replicated) Phoenix application based on subdomains.


Nowadays it started encountering issues with lack of available pool connections in RDS Postgres instance. For example I’ve had to scale rolling updates deployments to single instance to release db connections and make place to deploy new version with migrations to be passed by all tenants databases. Partially it has been solved by narrowing pool size to 2 for tenants with small traffic. Recently I’ve setup larger db instance and now it is expensive db.r5.xlarge with almost drained open connections during deployment and with wasted cpu/memory utilisation, furthermore the healthcare app I'm working on is constantly growing.

My question is 
if 
would it be possible to establish connections on demand with given live time on requests peak, until maximum pool size would be reached? Dynamic connection would be cached and released after some idle time.

My research brought me to the pgBouncer but I don’t feel it would be good enough because it can run only single transaction or it needs extra server instance.
I found this problem as unresolved in Elixir community 
Info on implementing custom Ecto.Pool or 
open a connection on the fly?.
I’ve noticed that desired feature was present in earlier versions of 
db_connection with use of DBConnection.Poolboy and :pool_overflow or :max_overflow options, here's a great outdated topic.
I found also some outdated attempt to solve this issue ecto_pools_connection_cache

Where to start?

The tricky thing about lazy connection polling is knowing exactly when to start a new connection. If you have a pool at 10, spawning a new connection as soon as you have one extra query may not be efficient at all, it may be better to simply wait a bit and get one of the connection in the pool. Then you also need to consider when is the best time to shed those connections, which is not trivial either.

The poolboy approach is extremely efficient for database connections, as they take a while to setup, and they would be use at least once. So what happened with Poolboy was that, as soon as you entered overflow mode, performance actually degraded due to all recycling.

The dynamic repo feature was mostly designed around multiple instances. If you are still connecting on the same server, I would strongly consider using database schemas instead (Ecto calls them prefixes). In those cases, what I usually recommend is to set globally a schema that does not exist, forcing you to pick the proper prefix. You can also give invalid prefixes to the schemas too, making sure you are picking the correct one.

But lazy database connection is useful to not stress the database. A good reference to read https://github.com/brettwooldridge/HikariCP/wiki/About-Pool-Sizing

Allow setup a "min" and "max" is useful to:

  • not stress the database unnecessarily, avoiding creating more connections than is really needed just because some peak.
  • is easier to define a good autoscale configuration based on cpu utilization in hybrid application and mix IO and CPU bounds tasks
  • remove the necessity to use a pgbouncer that will lose the prepared statement feature