Allow anonymization of tables without an AID column.
cristianberneanu opened this issue · comments
In such cases, a synthetic row index value should be used as the AID.
do you have any way of accomplishing this in mind? Only silly idea that occurred to me is to use ctid
but it segfaults :P .
If no prior thoughts on this, I'll start looking for options.
my (un)educated guess points me towards these bits:
static void append_aid_args(Aggref *aggref, List *aid_refs)
{...
foreach (cell, aid_refs)
{
AidRef *aid_ref = (AidRef *)lfirst(cell);
TargetEntry *aid_entry = make_aid_target(aid_ref, list_length(aggref->args) + 1, false);
/* Append the AID argument to function's arguments. */
aggref->args = lappend(aggref->args, aid_entry);
aggref->aggargtypes = lappend_oid(aggref->aggargtypes, aid_ref->aid_column->atttype);
...
and
static TargetEntry *make_aid_target(AidRef *aid_ref, AttrNumber resno, bool resjunk)
{
TargetEntry *te = makeTargetEntry(make_aid_expr(aid_ref), resno, "aid", resjunk);
te->resorigtbl = aid_ref->relation->oid;
te->resorigcol = aid_ref->aid_column->attnum;
return te;
}
but around the make_aid_expr
I make an expression doing something like row_number() OVER ()
. Plugging a row_number
into the query manually segfaults in (EDIT: this segfault looks unrelated to publish_*
access levelrow_number
, more likely it's #289) looks like exactly what we need.
Does this sound sane? CC @edongashi
Maybe there's a way to export CTID. Row number is not good because it can change once we have filters. If there's no straightforward system column which we can expose, I would put this on hold for now.
The downside of a physical location is that we have different aid noise samples if the same dataset is hosted multiple times. I can't think of a good solution as of right now.
Since there is no straightforward solution in sight and this feature is not that important, let's drop it.