diffix / pg_diffix

Implementation of the Open Diffix anonymization mechanism for PostgreSQL.

Home Page:https://www.open-diffix.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Allow anonymization of tables without an AID column.

cristianberneanu opened this issue · comments

In such cases, a synthetic row index value should be used as the AID.

do you have any way of accomplishing this in mind? Only silly idea that occurred to me is to use ctid but it segfaults :P .

If no prior thoughts on this, I'll start looking for options.

my (un)educated guess points me towards these bits:

static void append_aid_args(Aggref *aggref, List *aid_refs)
{...
  foreach (cell, aid_refs)
  {
    AidRef *aid_ref = (AidRef *)lfirst(cell);
    TargetEntry *aid_entry = make_aid_target(aid_ref, list_length(aggref->args) + 1, false);

    /* Append the AID argument to function's arguments. */
    aggref->args = lappend(aggref->args, aid_entry);
    aggref->aggargtypes = lappend_oid(aggref->aggargtypes, aid_ref->aid_column->atttype);
...

and

static TargetEntry *make_aid_target(AidRef *aid_ref, AttrNumber resno, bool resjunk)
{
  TargetEntry *te = makeTargetEntry(make_aid_expr(aid_ref), resno, "aid", resjunk);

  te->resorigtbl = aid_ref->relation->oid;
  te->resorigcol = aid_ref->aid_column->attnum;

  return te;
}

but around the make_aid_expr I make an expression doing something like row_number() OVER (). Plugging a row_number into the query manually segfaults in publish_* access level (EDIT: this segfault looks unrelated to row_number, more likely it's #289) looks like exactly what we need.

Does this sound sane? CC @edongashi

Maybe there's a way to export CTID. Row number is not good because it can change once we have filters. If there's no straightforward system column which we can expose, I would put this on hold for now.

The downside of a physical location is that we have different aid noise samples if the same dataset is hosted multiple times. I can't think of a good solution as of right now.

Since there is no straightforward solution in sight and this feature is not that important, let's drop it.