Drop http://www.w3.org/1999/02/22-rdf-syntax-ns#namedNode ?

Question

Drop http://www.w3.org/1999/02/22-rdf-syntax-ns#namedNode ?

RubenVerborgh opened this issue 4 years ago · comments

The URI does not resolve, and it's a long string comparison to be performed. Suggestion: just leave empty.

And for blank, use a single char like _ or so.

Thom · Answer 1 · Thu May 14 2020 18:41:20 GMT+0800 (China Standard Time)

Yeah, this basically is a published version of an experimental internal model, I found it disappointing that I couldn't find them on LOV.

We want to have some kind of IRI though (which should indeed resolve), since that enables the parser to be able to simply switch on datatype unconditionally. AFAIK (most) JS engines use interned strings, so the compare should come down to a simple reference compare.

Ruben Verborgh · Answer 2 · Thu May 14 2020 18:49:41 GMT+0800 (China Standard Time)

Well, an identical series of characters parsed n times will still occupy n times the space. Interning doesn’t reconcile those. So if it must be a URI (and I don’t see why that would provide an advantage, certainly not memory- or speed-wise), perhaps make it a short URN.

You can still switch unconditionally, just much faster.

Joep Meindertsma · Answer 3 · Tue May 26 2020 23:16:56 GMT+0800 (China Standard Time)

I agree that the current URL should be dropped, as it does not resolve.

In most RDF serialization formats, the default datatype for literals is String. If HexTuples would us the NamedNode (or the URI) as the default datatype, it should always serialize strings with xsd:string. That way, NamedNode Tuple statements do not need an IRI in the datatype.

If it really needs an IRI, I suggest linking to a NamedNode concept in this very spec - it would be a sensible place to resolve to. Or maybe the URI spec?

Thom · Answer 4 · Fri Jun 19 2020 18:11:57 GMT+0800 (China Standard Time)

xsd:anyURI is a good candidate replacement for rdf:namedNode

Ruben Verborgh · Answer 5 · Fri Jun 19 2020 18:18:26 GMT+0800 (China Standard Time)

No, that’s a URI, not the node named by that URI.

Thom · Answer 6 · Fri Jun 19 2020 19:06:47 GMT+0800 (China Standard Time)

Perhaps I'm misunderstading, but

[s, p, v, dt, l, g]

that’s a URI

rdf:namedNode is currently in the dt position which indicates the datatype of v

not the node named by that URI

No that'd be the v position

So together

[s, p, "schema.org/name", "http://www.w3.org/2001/XMLSchema#anyURI", l, g]

Though rereading that part of a spec more closely, they allow them to be relative, which might pose a problem

Ruben Verborgh · Answer 7 · Fri Jun 19 2020 19:10:21 GMT+0800 (China Standard Time)

There's a difference between

<http://example.org>
"http://example.org"^^<http://www.w3.org/2001/XMLSchema#anyURI>

Both exist and they are not the same.

Even "foo"^^<http://www.w3.org/1999/02/22-rdf-syntax-ns#namedNode> exists (despite the URI not resolving).

So I relaunch my suggestion of "" and "_", which will make a tremendous performance difference. Bonus: you can then even switch on dt.length and not on its contents, because no valid URIs of length 0 and 1 exist.

Thom · Answer 8 · Fri Jun 19 2020 19:33:15 GMT+0800 (China Standard Time)

There's a difference between

http://example.org
"http://example.org"^^http://www.w3.org/2001/XMLSchema#anyURI

Hmm, seems bizarre to me. I intuitively figured that, since 'everything is a resource', literals are just a convenient way to point to certain resources which lack a uri space / are too complex for the uri spec. Thinking of the datatype iri as the 'scheme' and the value as the 'path'.

I'll go and rethink some things ;)

Ruben Verborgh · Answer 9 · Fri Jun 19 2020 19:33:15 GMT+0800 (China Standard Time)

Here's a quick performance test: https://gist.github.com/RubenVerborgh/1b70a456230027468a715b54afb59242

On my machine:

2.5M triples with full URIs for dt: 4.6s
2.5M triples with single chars for dt: 2.8s

Thom · Answer 10 · Fri Jun 19 2020 19:34:51 GMT+0800 (China Standard Time)

literals are just a convenient way to point to certain resources which lack a uri space / are too complex for the uri spec

*Or vice-versa, that uri's are used to determine points in irregular defined spaces

Ruben Verborgh · Answer 11 · Fri Jun 19 2020 19:36:14 GMT+0800 (China Standard Time)

Hmm, seems bizarre to me.

See "http://example.org"^^http://www.w3.org/2001/XMLSchema#anyURI as a shortcut for

_:x a Literal.
_:x _:value "http://example.org".
_:x _:dataType <http://www.w3.org/2001/XMLSchema#anyURI>.

The syntax in fact hints at this interpretation, with ^ representing reverse path traversal in N3.

Full answer in https://www.w3.org/TR/rdf11-mt/

Thom · Answer 12 · Mon Dec 06 2021 21:40:49 GMT+0800 (China Standard Time)

Closed in https://github.com/ontola/hextuples-parser/releases/tag/v2.0.0

Has been replaced with globalId and localId respectively