matrix-org / sydent

Sydent: Reference Matrix Identity Server

Home Page:http://matrix.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Consider supporting PostgreSQL (and switching to it for matrix.org/vector.im)

reivilibre opened this issue · comments

Some of the recent problems with running the casefolding migration script against the live SQLite database have led us to think about whether we should support PostgreSQL as an alternative to SQLite (and use it in production).

Advantages

  • As a team, we generally seem more experienced with Postgres
    • this extends to things like reading query plans
  • Postgres' concurrency model (multi-version concurrency control) owes itself more to running these out-of-process jobs without locking up the main Sydent process.
    • even after thinking through SQLite's locks for this job, we haven't deduced why it does not work properly. (It may be a SQLite bug, but assuming it's not, we appear to have a better grasp of Postgres' concurrency model.)
  • Dubious: I can see some people being keen on this for 'scalability' reasons but I'm not aware of there being a huge load placed on Sydent anyway.
  • Dubious: Postgres supports things like replication, which may be useful for operating a reliable service — not sure if we care about that

Disadvantages

  • Obviously, it's potentially more code to support. We'd want to expand CI to run against Postgres.
  • If we needed to, moving back in the other direction seems less simple (I don't immediately find an off the shelf tool, though equally it may just be possible to create an empty Sydent schema in SQLite and only dump/load the table contents.)

Why adding support for PostgreSQL may not be so bad

  • SQLite is generally inspired by Postgres for its SQL dialect. Whilst not strictly true, it's almost as though Postgres SQL is a superset of SQLite SQL — I expect there to be very few (if any) query changes necessary. Moving in the other direction is harder because SQLite is pickier, especially around schema changes.
    • essentially: if you write your queries for SQLite then they probably work on Postgres
  • Database migration can likely be done out-of-the-box with a tool such as pgloader, which converts from several database and file formats (including SQLite3) to Postgres
    • it seems to handle indexes and NOT NULL properly.
  • The database is only small so we can probably do migration with a small amount of downtime without needing to worry too much
  • We only do fairly basic operations to the database so it's unlikely we would need to write engine-specific queries

Obviously, it's potentially more code to support. We'd want to expand CI to run against Postgres.

Only if you want to support both DBs in parallel. (But sqlalchemy exists to mitigate that somewhat anyway.)

As another reason I'm more comfortable with Postgres, I found out sqlite3's .dump is dangerous as it doesn't preserve PRAGMA user_version, then when you start up a new Sydent on the backup, it goes and corrupts itself. I did this on a copy of the backup so I didn't lose anything, but this sort of thing is concerning and in contrast I have more faith in pg_dump which I have used time and time again.

I would also go to say that maybe we should use a proper SQL migration library in Sydent instead of rolling our own, but oh well.

I would also go to say that maybe we should use a proper SQL migration library in Sydent instead of rolling our own, but oh well.

I've had neutral to positve experiences with sqlalchemy + alembic. But I don't think Sydent's DB changes all that much; doubt the investment would be worth it. (Unless it was a testbed for synapse...)

In our case, we have high usage of sydent, the sydent process is regularly at 100% CPU on some of our sydent deployement (we use 17 sydent servers replicated).
Also backup isn't simplified by SQLite.

In terms of the work required, off the top of my head one would need to:

  • Set up config/env vars for postgres
  • Go through each query that SQLite currently executes and check that the same results would be returned by PostgreSQL. If not, try to rewrite it without sacrificing performance. As a last resort, pick separate queries to use based on the currently in-use database engine.
  • Writing an SQLite → PostgreSQL migrator.
  • Getting the tests to run on PostgreSQL (unit tests as a start is probably fine).
  • Write up documentation for using it.

So no small feat, but the codebase is fairly small and self-contained. I count 82 instances of cur.execute( in the source code, and the hope is that most of those queries will already “just work” with PostgreSQL.

One should also be weary of subtle differences between SQLite and PostgreSQL that may cause bugs.

It would also be nice to define minimum supported versions of each database engine. I propose just copying Synapse’s: https://matrix-org.github.io/synapse/latest/deprecation_policy.html