ontola / hextuples

An RDF serialization format designed for performance in the browser

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Drop http://www.w3.org/1999/02/22-rdf-syntax-ns#namedNode ?

RubenVerborgh opened this issue · comments

The URI does not resolve, and it's a long string comparison to be performed. Suggestion: just leave empty.

And for blank, use a single char like _ or so.

commented

Yeah, this basically is a published version of an experimental internal model, I found it disappointing that I couldn't find them on LOV.

We want to have some kind of IRI though (which should indeed resolve), since that enables the parser to be able to simply switch on datatype unconditionally. AFAIK (most) JS engines use interned strings, so the compare should come down to a simple reference compare.

Well, an identical series of characters parsed n times will still occupy n times the space. Interning doesn’t reconcile those. So if it must be a URI (and I don’t see why that would provide an advantage, certainly not memory- or speed-wise), perhaps make it a short URN.

You can still switch unconditionally, just much faster.

I agree that the current URL should be dropped, as it does not resolve.

In most RDF serialization formats, the default datatype for literals is String. If HexTuples would us the NamedNode (or the URI) as the default datatype, it should always serialize strings with xsd:string. That way, NamedNode Tuple statements do not need an IRI in the datatype.

If it really needs an IRI, I suggest linking to a NamedNode concept in this very spec - it would be a sensible place to resolve to. Or maybe the URI spec?

commented

xsd:anyURI is a good candidate replacement for rdf:namedNode

No, that’s a URI, not the node named by that URI.

commented

Perhaps I'm misunderstading, but

[s, p, v, dt, l, g]

that’s a URI

rdf:namedNode is currently in the dt position which indicates the datatype of v

not the node named by that URI

No that'd be the v position

So together

[s, p, "schema.org/name", "http://www.w3.org/2001/XMLSchema#anyURI", l, g]

Though rereading that part of a spec more closely, they allow them to be relative, which might pose a problem

There's a difference between

  • <http://example.org>
  • "http://example.org"^^<http://www.w3.org/2001/XMLSchema#anyURI>

Both exist and they are not the same.

Even "foo"^^<http://www.w3.org/1999/02/22-rdf-syntax-ns#namedNode> exists (despite the URI not resolving).

So I relaunch my suggestion of "" and "_", which will make a tremendous performance difference. Bonus: you can then even switch on dt.length and not on its contents, because no valid URIs of length 0 and 1 exist.

commented

There's a difference between

http://example.org
"http://example.org"^^http://www.w3.org/2001/XMLSchema#anyURI

Hmm, seems bizarre to me. I intuitively figured that, since 'everything is a resource', literals are just a convenient way to point to certain resources which lack a uri space / are too complex for the uri spec. Thinking of the datatype iri as the 'scheme' and the value as the 'path'.

I'll go and rethink some things ;)

Here's a quick performance test: https://gist.github.com/RubenVerborgh/1b70a456230027468a715b54afb59242

On my machine:

  • 2.5M triples with full URIs for dt: 4.6s
  • 2.5M triples with single chars for dt: 2.8s
commented

literals are just a convenient way to point to certain resources which lack a uri space / are too complex for the uri spec

*Or vice-versa, that uri's are used to determine points in irregular defined spaces

Hmm, seems bizarre to me.

See "http://example.org"^^http://www.w3.org/2001/XMLSchema#anyURI as a shortcut for

_:x a Literal.
_:x _:value "http://example.org".
_:x _:dataType <http://www.w3.org/2001/XMLSchema#anyURI>.

The syntax in fact hints at this interpretation, with ^ representing reverse path traversal in N3.

Full answer in https://www.w3.org/TR/rdf11-mt/

commented

Closed in https://github.com/ontola/hextuples-parser/releases/tag/v2.0.0

Has been replaced with globalId and localId respectively