tdwg / vocab

Vocabulary Maintenance Specification Task Group + SDS + VMS

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Default serialization for RDF

baskaufs opened this issue · comments

Recommendation 10 of the GUID Applicability Statement https://github.com/tdwg/guid-as/blob/master/guid/applicability_statement.doc states that the default response serialization for RDF should be XML. So that should probably be reflected in the documentation specification. However, XML no longer has any special status in RDF 1.1, so I'm hesitant to specify any particular serialization in the document, since some other form like Turtle or JSON-LD might come to predominate in the future. Currently I've said "Although there are a number of possible machine-readable formats, by default each resource should be described using RDF in a serialization recommended by the TDWG Technical Architecture Group (TAG)." Is that OK?

There might be a couple of issues with what you have currently. The first
is the reference to the Technical Architecture Group. I think it would be
much clearer and effective if the recommendation was the responsibility of
the Process Interest Group. That group will persist. Of the TAG I am not as
certain.

Beyond that, I am not sure that a default response format has to be
defined, except maybe insofar as it must be in an existing standard format.

On Sat, Apr 16, 2016 at 3:01 PM, Steve Baskauf notifications@github.com
wrote:

Recommendation 10 of the GUID Applicability Statement
https://github.com/tdwg/guid-as/blob/master/guid/applicability_statement.doc
states that the default response serialization for RDF should be XML. So
that should probably be reflected in the documentation specification.
However, XML no longer has any special status in RDF 1.1, so I'm hesitant
to specify any particular serialization in the document, since some other
form like Turtle or JSON-LD might come to predominate in the future.
Currently I've said "Although there are a number of possible
machine-readable formats, by default each resource should be described
using RDF in a serialization recommended by the TDWG Technical Architecture
Group (TAG)." Is that OK?


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#39

I sort of favor RDF in that it is the most (?) mature graph serialization. That said, I probably would be hard pressed to defend why I think graph-based models should apply to vocabularies.(*)

(*)Well actually, as an algebraist, my argument would be aesthetic: who could help but admire that there are a handful of digraphs that fully model polynomial algebra (as learned in high school) and that, when all the edge-arrows are turned around, you get the digraphs that describe tensor algebra (more simply than as taught to physics graduate students.) Alas, Category Theory largely remains unloved by computer scientists.

I was assuming that RDF would be the machine-readable form for vocabularies and metadata descriptions of other resources - it's not the only option, but it's the only one that has a history in TDWG up to now. I was really asking about RDF serializations (like XML vs. Turtle). Given that XML no longer has any special status under RDF 1.1 and that Turtle is now a W3C Recommendation, I would prefer to not specify any particular serialization of RDF in the document. As far as the role of the TAG is concerned, I'm in the dark about that. I'd be fine to say to use the recommendation of the Process IG.

The only problem is that we are willfully violating another existing TDWG standard (the GUID standard), so maybe we should request that it be changed at the same time this one is adopted. I'm not sure if that actually is possible under the current rules.

As to RDF serialization, I prefer RDF1.1 Turtle to RDF/XML by a long shot.

I wonder if we might need to offer some guidance about use of JSON/LD for those who might dream about contributing their vocabularies to LOD .... I am not sure what such guidance would be...

@baskaufs Can you say here what would be the specific violation of the GUID standard?

Recommendation 10 of the GUID Applicability Statement https://github.com/tdwg/guid-as/blob/master/guid/applicability_statement.doc states that the default response serialization for RDF should be XML.

That doesn't mean that it couldn't be provided in other serializations if the client requested them through content negotiation. It seems to me that the default serialization should be whatever is in widespread use. Up to this point, that's RDF/XML, but it could change in the future, which is why I left it vague in the specification.

I ran some test CONSTRUCT SPARQL queries through HTTP GET to a Stardog endpoint. It had no problem responding when I requested application/rdf+xml, text/turtle, or application/ld+json. So if at some point dereferencing of TDWG IRIs were set up to be handled from a triplestore (as I suggested somewhere else recently), we could provide the RDF in any form that people asked for. If we send them a hand-coded file that includes the whole vocabulary, then it would be more of a hassle to support multiple serializations, although it's not really that hard to use an RDF editor to convert from one form to another.

Maybe we should dodge the whole issue by not specifying the serialization and leave this up to the implementers of the server that dereferences TDWG IRIs. I'm starting to think that's the way to go.

BTW, here's the test. cURL or Postman an HTTP GET to the URL:

http://dev-rdf.library.vanderbilt.edu/syriaca/query?query=CONSTRUCT%20%7B%3Chttp%3A%2F%2Ffoo%3E%20a%20%3Fclass.%7D%20WHERE%20%7B%3Fs%20a%20%3Fclass.%7D%20LIMIT%2010

with an ACCEPT header of one of the three MIME types I listed above.

The GUID applicability statement has to be re-written too. Mainly to
deprecate the LSID recommendations. Their implementation is beyond the
majority of providers and the requirement that they are rewritten as HTTP
IRI is dependent on a tdwg.org resolver which no longer exists.

A good lesson for TDWG though and an opportunity for us to be
less prescriptive about the serlialization.

Greg

On Tuesday, 3 May 2016, Steve Baskauf notifications@github.com wrote:

Recommendation 10 of the GUID Applicability Statement
https://github.com/tdwg/guid-as/blob/master/guid/applicability_statement.doc
states that the default response serialization for RDF should be XML.

That doesn't mean that it couldn't be provided in other serializations if
the client requested them through content negotiation. It seems to me that
the default serialization should be whatever is in widespread use. Up to
this point, that's RDF/XML, but it could change in the future, which is why
I left it vague in the specification.

I ran some test CONSTRUCT SPARQL queries through HTTP GET to a Stardog
endpoint. It had no problem responding when I requested
application/rdf+xml, text/turtle, or application/ld+json. So if at some
point dereferencing of TDWG IRIs were set up to be handled from a
triplestore (as I suggested somewhere else recently), we could provide the
RDF in any form that people asked for. If we send them a hand-coded file
that includes the whole vocabulary, then it would be more of a hassle to
support multiple serializations, although it's not really that hard to use
an RDF editor to convert from one form to another.

Maybe we should dodge the whole issue by not specifying the serialization
and leave this up to the implementers of the server that dereferences TDWG
IRIs. I'm starting to think that's the way to go.


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#39 (comment)

Greg Whitbread
+61 418 670 368

taxamatics

The GUID applicability statement uses the terms SHOULD and MUST as defined in http://www.ietf.org/rfc/rfc2119.txt : "SHOULD This word, or the adjective "RECOMMENDED", mean that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course."

So I don't think we're in conflict, especially if the GUID applicability statement is to be revisited.

I agree with the sentiments above that it would be a mistake to mandate a particular RDF serialization. In fact, I think it would be a mistake to mandate RDF. My reading of the document (the draft TDWG Standards Documentation Specification) is that we are not mandating RDF, but, rather, using RDF to illustrate the proper way to provide machine interpretable representations of standards. From the intro to section 4: "The relationships described in this section may be expressed as Resource Description Framework (RDF), but that is not to the exclusion of other methods that may be available for expressing the same relationships in a manner that also facilitates machine processing."

I agree with Joel that perhaps it's best to stay out of the issue entirely and to say that the RDF Turtle examples are illustrative. The documentation specification should probably not specify any particular serialization, or even that the machine-readable form has to be RDF. There should be some machine-readable form that encodes the specified relationships, but what that is could change over time. I'm not sure that it's necessary to say what TDWG entity (TAG, executive, or whatever) will decide this, as the structure of TDWG may also evolve over time.

Generally made edits to refer to "machine-readable metadata" rather than specifically referring to RDF. Made an edit to section 1.3, removed reference to default serialization and the TAG in section 2.1.2, edited 2.2.1, 2.2.3.