ambiguity about canonical N-Triples / N-Quads

Question

ambiguity about canonical N-Triples / N-Quads

pchampin opened this issue 2 years ago · comments

Pierre-Antoine Champin commented 2 years ago

the specification of canonical N-Triples is silent about the datatype of xsd:string literals. More specifically :

    "hello world"

and

   "hello world"^^<http://www.w3.org/2001/XMLSchema#string>

are equivalent terms in N-Triples and N-Quads, and the spec does not say which one should be used as the canonical form.

Given that this is lacking from the N-Triples spec, the rd-canon spec should chose one and be explicit about it.

Ted Thibodeau Jr · Answer 1 · Wed Jan 11 2023 00:02:14 GMT+0800 (China Standard Time)

This should also be fed to the rdf-star WG, who can also update the N-Triples and N-Quads specs accordingly.

Gregg Kellogg · Answer 2 · Wed Jan 11 2023 01:21:37 GMT+0800 (China Standard Time)

Other than for Canonicalization, RDF serialization formats are typically restricted to parsing, not serializing; JSON-LD being the main exception.

RDF Concepts discusses this with MAY language:

Please note that concrete syntaxes may support simple literals consisting of only a lexical form without any datatype IRI or language tag. Simple literals are syntactic sugar for abstract syntax literals with the datatype IRI http://www.w3.org/2001/XMLSchema#string. Similarly, most concrete syntaxes represent language-tagged strings without the datatype IRI because it always equals http://www.w3.org/1999/02/22-rdf-syntax-ns#langString.

Making this a MUST for canonical forms is indeed something that needs to go into the update N-Triples and N-Quads specs in their canonicalization sections. Similarly, rdf:langString MUST NOT be used for a language-tagged literals, although the grammar doesn't support this in any case.

This, and the previous note on the need for Canonicalization in N-Triples should be in cross-referenced issues for those specs, but best wait until after their repositories have been set up, which should happen before too much longer.

Ted Thibodeau Jr · Answer 3 · Fri Jan 20 2023 05:25:53 GMT+0800 (China Standard Time)

[@gkellogg] RDF serialization formats are typically restricted to parsing, not serializing

I'm not at all sure what you mean by that... "serialization formats" are not for "serializing"?

Gregg Kellogg · Answer 4 · Tue Jan 24 2023 05:41:47 GMT+0800 (China Standard Time)

[@gkellogg] RDF serialization formats are typically restricted to parsing, not serializing

I'm not at all sure what you mean by that... "serialization formats" are not for "serializing"?

Does sound like an oxymoron :) But, there are typically no normative statements on how to serialize RDF graphs or datasets, other than for N-Triples canonical form, which has it's own problems, and restricts itself to serializing a single triple, not a graph. The specs describe the syntax and how to parse it, but not how to serialize it. Another exception is JSON-LD, which _does_describe how to serialize datasets to JSON-LD.

Gregg Kellogg · Answer 5 · Tue Jan 24 2023 05:42:44 GMT+0800 (China Standard Time)

See w3c/rdf-n-triples#2 and w3c/rdf-n-quads#2.

Ted Thibodeau Jr · Answer 6 · Wed Jan 25 2023 12:50:47 GMT+0800 (China Standard Time)

there are typically no normative statements on how to serialize RDF graphs or datasets

Well, that seems like a horrendous oversight and, dare I say, a bug in each document with such lack. It's no wonder there are nonstop issues with interop and uptake, slowly growing interest in RDF/LD notwithstanding!

Pierre-Antoine Champin · Answer 7 · Wed Jan 25 2023 15:59:14 GMT+0800 (China Standard Time)

Well, that seems like a horrendous oversight

Well, the implicit contract of any serializer is to serialize your data to whatever parses back to the same data.

But granted, this could be made explicit, probably with a more specific definition of what we consider to be the "same" data (in RDF, this means "isomorphism", because blank nodes... well, you know!).

Gregg Kellogg · Answer 8 · Thu Jan 26 2023 06:33:23 GMT+0800 (China Standard Time)

Well, that seems like a horrendous oversight

I don't think RDF uptake can be laid on the lack of specs to define explicitly how to serialize an RDF Graph/Dataset, nor should it IMHO. At most might be a statement that serialized graph/dataset representations MUST be a valid representation of the associated grammar rules. If you think in terms of computer languages, the abstract RDF syntax is closer to a machine language, with N-Triples and N-Quads like assembly languages, and Turtle/TriG/RDFa/JSON-LD like high level languages targeting that machine language. An argument can be made that there is a normative way to represent the abstract syntax in N-Triples and N-Quads (not withstanding Blank Node identifiers), but not for the others. JSON-LD provides a way to transform a dataset into JSON-LD, but not the way to do so.

Looking elsewhere, SPARQL describes an algebra that is targeted by the syntax. There are systems that will re-serialize the algebra into the SPARQL Grammar, but no normative statements about doing so.

We provide a number of examples for representing data in the various concrete examples, and define how to parse those representations to transform them into the underlying representation. Trying to codify how to re-create that serialization from the underlying representation is certainly outside our charter, and not something we should get into in any case, IMHO.

But granted, this could be made explicit, probably with a more specific definition of what we consider to be the "same" data (in RDF, this means "isomorphism", because blank nodes... well, you know!).

We do define graph/dataset isomorphism, conceivably a statement could be made that an serialization of a graph or dataset, when re-parsed, MUST be isomorphic to that graph or dataset.

Markus Sabadello · Answer 9 · Thu May 04 2023 02:43:05 GMT+0800 (China Standard Time)

Has this been solved by merging #96 ?

Gregg Kellogg · Answer 10 · Thu May 04 2023 02:55:47 GMT+0800 (China Standard Time)

Yes, I believe it has.

Markus Sabadello · Answer 11 · Wed May 10 2023 22:43:31 GMT+0800 (China Standard Time)

On the 10 May 2023 call, the WG decided to close this issue.