w3c / rdf-canon

RDF Dataset Canonicalization (deliverable of the RCH working group)

Home Page:https://w3c.github.io/rdf-canon/spec/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Implementation feedback

pchampin opened this issue · comments

Here is a list of remarks that I noted while following the spec to implement the algorithm in Rust.
Rather than creating a bunch of small issues, I kept everything in one big issue. We can split some items out if they need a separate discussion.

  • About code point order
    • it might be useful to explain how to point out that, if strings are internally encoded as UTF-8, then comparing them byte-wise (using standard "lexicographic" order) gives the same order as code point order
      • this is confirmed by Wikipedia and the Rust documentation says it is, and by
      • or at least, we should advise developers to check whether the comparison operators on strings in the language they use is using code-point order or another one
  • algo Canonicalization
    • step 3.2 "including repetitions" is a bit mysterious. I assume that it means that I must authorize n to occur several times in the list mapped to $h_f(n)$ , but I don't even see when this is supposed to happen
      • or does it just mean that $h_f(n)$ may occur several times? In which case this is not consistent with the definition of "hash to blank nodes" (map of hash to lists of nodes)
      • as a matter of fact "add $h_f(n)$ and n" to the map is a bit amgiguous
    • step 5.1 is a bit ambiguous : it mentions the Hash n debgree algorithm, but only to indicate the expected type of elements of the list, it is not meant to be called here (but in step 5.2.4)
      • the 'explanation' is equally confusing, because its says "this list establishes an order"
        • I suggest replacing the explanation with "this list will be populated by step 5.2, and will establish an order..."
  • algo Hash Related Blank node
    • it might be useful to hint that the issuer passed to this algo is not mutated by the algorithm
    • calling Hash 1st Degree Quads in step 4 raises the question of optimizing it if we are going to call it several times with the same node
      • if some form of memoization could help, this should probably be hinted -- in particular, we could store it in the c14n state...
      • as a matter of fact, this is what my implementation does
  • algo Hash N-degree Quads
    • it might be useful to hint that the issuer passed to this algo is not mutated by the algorithm