Explicit revision sequence?

Question

Explicit revision sequence?

kevlarr opened this issue 4 years ago · comments

Unlike pure timestamp-based revision tools, alembic requires each revision to explicitly state which it follows, and it errors if two point to the same "down" revision. Unfortunately, all alembic tracks is the last version applied, which is actually problematic if...

master branch has revision A
Alice checks out feature-1 branch and writes revision B
Bob checks out feature-2 branch and writes revision C
Alice merges feature-1 to master and creates history A <- B
CI runs revisions
Bob merges master into feature-2 to resolve conflicts and effectively now has two histories A <- B | A <- C
Bob resolves this by updating revision B to point to revision C instead of revision A, such that A <- C <- B
Bob merges feature-2 into master
CI runs revisions

In this situation, because the alembic_version stored in the database (from step 4) is B, alembic assumes that anything before it has already been applied and will actually never execute C. Sure, this is kind of a breakdown in workflow and (hopefully) doesn't happen often in practice, but it's potentially hard to spot for subtler changes (like adding an index, changing a check constraint, etc) that might not break tests.

Unfortunately, the simpler timestamp approach inspired by active-record or dbmate (which is great for sorting through revisions) that is currently used is also problematic because it absolutely allows applying revisions in different orders.

For instance, in the above example, both Alice and Bob could merge their revisions into master, but if Bob merged first and CI ran migrations prior to Alice merging (for instance, in a dev server) then revisions would ultimately have been executed in the order A -> C -> B. If master is then merged into a release branch or something, both revisions would be present at the same time and would be run in the order A -> B -> C since B has an earlier timestamp.

In the unfortunate situation where both Alice's and Bob's revisions touched a similar object (table, index, etc) then any database migrated by master could easily differ from any database migrated via release.

So... what's the answer?

kevin · Answer 1 · Fri Apr 09 2021 20:51:57 GMT+0800 (China Standard Time)

Tangentially related to #18

kevin · Answer 2 · Fri Apr 09 2021 20:56:33 GMT+0800 (China Standard Time)

I think the ideal approach would be to ditch timestamps and just do a straight sequence...

001-rev-1.sql
002-rev-2.sql
...

Using a sequence gives ordering of timestamps (as opposed to the alembic mess) but it enforces a linear order by default without requiring SQL files to have pointers to other files, and it doesn't allow room for "interleaving" like with timestamps. Plus, timestamps would also require pointers, in which case the timestamps themselves don't accurately model the order of revisions, so...

If people merge conflicting histories (or journeys, to stick with the cheesy language) then they just change filenames, rather than pointers in-file.

If a revision that's already been applied is changed to occur after a new revision, things will already break (because jrny won't find the old revision file) and that's good
Unapplied revisions must always then be added after applied ones

CAVEAT

Switching to a sequence is not backwards-compatible

kevin · Answer 3 · Thu Apr 22 2021 23:32:30 GMT+0800 (China Standard Time)

Added via #19