bundle: Dependency management of schema and data

Question

bundle: Dependency management of schema and data

erichanson opened this issue 7 months ago · comments

In order to start version-controlling schema/objects in bundle, bundle is going to need robust dependency tracking.

Right now, bundle only version controls "normal" row data, but even normal data has dependencies, in the sense of foreign keys to other rows. The checkout function has been working around this by just disabling constraints during checkout and then re-enabling them when it's done, and also checking out rows in meta (schema, postgres objects) before any "normal" row data. If a row foreign-keys to a row outside this bundle, and that bundle doesn't exist, checkout just fails.

But when version-controlling functions, even these work-arounds don't work: When a function depends on another function, the dependency simply must exist (whether constraints are disabled or not) before the depender can be created. Since functions right now are just restored in random order, checkout breaks because it tries to create function a when it depends on function b which just happens to be further down the list.

This has been on the radar for a while, with the idea that it could be addressed by inspection at checkout time, but without doing static analysis of the function definition, there just isn't enough information to checkout rows in a sequence that respects dependencies.

t's tempting to continue working around it with some horrible work-around, like, simply try to checkout every function in a commit, flag the ones that didn't work, and try them again after the ones that did work are present, rince and repeat until the entire dependency tree has been traversed by just repeatedly failing. This would actually probably work, but it is of course horrible and I think there are some huge benefits to solving it for real.

So, the whole space needs to be addressed.

Ultimately, checkout just needs to know what order to checkout the rows in. Meta rows (objects/schema) first, data second, but within each group, an ordering that respects dependencies. However, since both schema and data are just rows to Aquameta, I believe we can record data dependencies (foreign keys) and schema dependencies (rows in the meta.* tables) with a single solution.

If this were working, dependency management could happen automatically, just through introspection. You would never need to explicitly say "this bundle depends on that bundle" because it could all be inferred by analyzing pg_depend for objects, and foreign key constraints for data.

Some prototyping:

create function bundle.row_dependencies(meta.row_id) returns setof meta.row_id as $$
    -- magic
$$ language plpgsql;

-- view that lists dependencies of every head row
create view bundle.dependency as
select hcr.row_id, b.id, b.name as bundle_name, meta_type, dependent_bundle_id, dependent_row_id
from bundle.head_commit_row hcr
    join bundle.bundle b on hcr.bundle_id = b.id
    join report.dependency -- https://wiki.postgresql.org/wiki/Pg_depend_display
...

-- ultimately, determine an order and save it in the commit
alter table rowset_row add column ordinal_position integer;

-- track cross-bundle deps explicitly
create table bundle_dependency (
    id uuid not null default public.uuid_generate_v4() primary key,
    bundle_id uuid references bundle(id),
    dependency_id uuid references bundle(id)
);