Explicit revision sequence?
kevlarr opened this issue · comments
Unlike pure timestamp-based revision tools, alembic
requires each revision to explicitly state which it follows, and it errors if two point to the same "down" revision. Unfortunately, all alembic
tracks is the last version applied, which is actually problematic if...
master
branch has revisionA
- Alice checks out
feature-1
branch and writes revisionB
- Bob checks out
feature-2
branch and writes revisionC
- Alice merges
feature-1
tomaster
and creates historyA <- B
- CI runs revisions
- Bob merges
master
intofeature-2
to resolve conflicts and effectively now has two historiesA <- B | A <- C
- Bob resolves this by updating revision
B
to point to revisionC
instead of revisionA
, such thatA <- C <- B
- Bob merges
feature-2
intomaster
- CI runs revisions
In this situation, because the alembic_version
stored in the database (from step 4) is B
, alembic
assumes that anything before it has already been applied and will actually never execute C
. Sure, this is kind of a breakdown in workflow and (hopefully) doesn't happen often in practice, but it's potentially hard to spot for subtler changes (like adding an index, changing a check constraint, etc) that might not break tests.
Unfortunately, the simpler timestamp approach inspired by active-record or dbmate (which is great for sorting through revisions) that is currently used is also problematic because it absolutely allows applying revisions in different orders.
For instance, in the above example, both Alice and Bob could merge their revisions into master
, but if Bob merged first and CI ran migrations prior to Alice merging (for instance, in a dev server) then revisions would ultimately have been executed in the order A -> C -> B
. If master
is then merged into a release
branch or something, both revisions would be present at the same time and would be run in the order A -> B -> C
since B
has an earlier timestamp.
In the unfortunate situation where both Alice's and Bob's revisions touched a similar object (table, index, etc) then any database migrated by master
could easily differ from any database migrated via release
.
So... what's the answer?
I think the ideal approach would be to ditch timestamps and just do a straight sequence...
001-rev-1.sql
002-rev-2.sql
...
Using a sequence gives ordering of timestamps (as opposed to the alembic
mess) but it enforces a linear order by default without requiring SQL files to have pointers to other files, and it doesn't allow room for "interleaving" like with timestamps. Plus, timestamps would also require pointers, in which case the timestamps themselves don't accurately model the order of revisions, so...
If people merge conflicting histories (or journeys, to stick with the cheesy language) then they just change filenames, rather than pointers in-file.
- If a revision that's already been applied is changed to occur after a new revision, things will already break (because
jrny
won't find the old revision file) and that's good - Unapplied revisions must always then be added after applied ones
CAVEAT
Switching to a sequence is not backwards-compatible