w3c / rdf-canon

RDF Dataset Canonicalization (deliverable of the RCH working group)

Home Page:https://w3c.github.io/rdf-canon/spec/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Exact nature of the issued identifier list (editorial)

iherman opened this issue · comments

In §4.4 Issue Identifier algorithm the exact nature of issued identifiers list is not really clear. Would not be clearer to say:

List of mappings from existing identifiers to issued identifiers, in the order in which the issued identifiers were issued. This list is used to track the prevent issuing more than one new identifier per existing identifier, and to allow blank nodes to be reassigned identifiers some time after issuance.

Maybe even better to formalize these in terms of infra:

a list of tuples of the form (existing, issued).

In general, I believe using infra for the description/definition of data structures might be better, it would avoid ambiguities.

B.t.w., is it really important to have it "in the order in which they were issued"? If that constraint wasn't there, a map could be used instead of tuples. So far, I did not see any point in the algorithm that relied on the order.

(Caveat: I only implemented the first degree hashes only. Maybe that is the reason.

Also, used to track the prevent issuing -> used to prevent issuance of.

A better name might be issued identifiers map, with a reference to Infra for map. It is a 1:1 mapping from existing to issued identifier, so order takes no part

issued identifiers list
A map that relates existing identifiers to issued identifiers, to prevent issuance of more than one new identifier per existing identifier, and to allow blank nodes to be reassigned identifiers some time after issuance.

(Not that it really matters, but Infra maps are ordered.

@iherman,

B.t.w., is it really important to have it "in the order in which they were issued"? If that constraint wasn't there, a map could be used instead of tuples. So far, I did not see any point in the algorithm that relied on the order.

Yes, the order matters later as mentioned in the Hash N-Degree Quads algorithm.

Each unlabeled blank node is assigned a temporary label in the order in which it is reached in the gossip path being explored.
...
Ultimately, the algorithm selects a shortest gossip path, distributing canonical labels to the unlabeled blank nodes in the order in which they appear in this path. The hash of this encoded shortest path, called the N-degree hash of n, distinguishes n from other blank nodes in the dataset.

And, I believe, this manifests in step 6.3 of the core algorithm.

But if we're going to use Infra, we could use a Map since it is ordered as @gkellogg mentioned.

But if we're going to use Infra, we could use a Map since it is ordered as @gkellogg mentioned.

This would suggest a minor change to #39 to use "ordered map" instead of simply "map", in the definition, but the reference is still to the "ordered map" definition in Infra.