sballesteros / linked-data-registry

A CouchDB powered registry for linked data.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Consider using a hypermedia format

besquared opened this issue · comments

Have you guys considered using something like HAL or JSON-API instead of a bespoke JSON response? I think you're only getting part of REST without something like this.

Yes definitely. We were thinking of using Hydra as we already use JSON-LD but this is still open to discussion.

I don't know that much about JSON-LD or Hydra TBH. They seems pretty verbose but I'm sure they're more complete than the simpler non-rdf related formats. I suspect without more information I can't really say either way. TIME TO GO LEARN.

Ok, here is what I think I am going to do for JSON-LD support (note that it involves departing from commonJS for dependencies (a.k.a dataDependencies of the data package protocol) and also ignoring some parts of the spec in favor of schema.org.

vanilla datapackage.json:

{
  "name": "mydpkg",
  "version": "0.0.0",
  "keywords": ["hyperdata", "json-ld", "test"],
  "dataDependencies": ["dpkg1/0.0.1", "dpkg2/0.2.1", "http://data.com/"],
  "author": {
    "name": "Sebastien Ballesteros",
    "email": "sebastien@standardanalytics.io"
  },
  "resources": [
    {
      "name": "myurl",
      "url": "http://data.com"
    },
    {
      "name": "mycsv",
      "path": "data/data.csv"
    },
    {
      "name": "mydata",
      "data": { "inline": "data" }
    }
  ]
}

What I dislike about it:

  1. no URL for the datapackage
  2. no URLs for the dataDependencies (where can we get them ?)
  3. no URLs for the resources
  4. no URLs for the content of the resource (if this content is relocated or made available in different format)
  5. not self described

still JSON but toward JSON-LD

Let's fix some of that with 2 new properties from JSON-LD:

{
  "@id": "mydpkg/0.0.0",
  "@type": "DataCatalog",  
  "name": "mydpkg",
  "version": "0.0.0",
  "keywords": ["hyperdata", "json-ld", "test"],
  "dataDependencies": ["dpkg1/0.0.1", "dpkg2/0.2.1", "http://data.com/"],
  "author": {
    "name": "Sebastien Ballesteros",
    "email": "sebastien@standardanalytics.io"
  },
  "resources": [
    {
      "@id": "mydpkg/0.0.0/resource0",
      "@type": "DataSet",
      "name": "myurl",
      "url": "http://data.com",
      "distribution": {
        "@type" :"DataDownload",
        "contentUrl" :"/mydpkg/0.0.0/resource0/myurl",
        "contentSize" : 1024,
        "encodingFormat": "txt"
      }
    },
    {
      "@id": "mydpkg/0.0.0/resource1",
      "@type": "DataSet",
      "name": "mycsv",
      "path": "data/data.csv",
      "distribution": {
        "@type" :"DataDownload",
        "contentUrl" :"/mydpkg/0.0.0/resource0/mycsv",
        "contentSize" : 1024,
        "encodingFormat": "csv"
      }
    },
    {
      "@id": "mydpkg/0.0.0/resource2",
      "@type": "DataSet",
      "name": "mydata",
      "data": { "inline": "data" },
      "distribution": {
        "@type" :"DataDownload",
        "contentUrl" :"/mydpkg/0.0.0/resource0/mydata",
        "contentSize" : 1024,
        "encodingFormat": "json"
      }
    }
  ]
}

Still not great:

  • URLs indicated by @id have no fixed base!
  • a machine still doesn't know that dataDependencies is a list of URLs or that url is an URL
  • @type helps a bit but still far from being self documented (especially if I am a machine)...

JSON-LD

Let's add a JSON-LD @context to fix all of that (using the semantic from schema.org when possible or from http://schema.standardanalytics.io (not online yet) when not available on schema.org or elsewhere):

{
  "@context": {
    "spec": "http://schema.standardanalytics.io/",
    "schema": "http://schema.org/",
    "@base": "http://registry.standardanalytics.io/",
    "url": { "@id": "schema:url", "@type": "@id" },
    "contentUrl": { "@id": "schema:contentUrl", "@type": "@id" },
    "contentSize": "schema:contentSize",
    "encodingFormat": "schema:encodingFormat",
    "name": "schema:name",
    "version": "schema:version",
    "keywords": { "@id": "schema:keywords", "container": "@list" },
    "author": "schema:author",
    "email": {"@id": "http://xmlns.com/foaf/0.1/mbox", "@type": "@id"},    
    "dataDependencies": {
      "@id": "spec:dataDependencies",
      "@type": "@id",
      "@container": "@set"
    },
    "data": "spec:data",
    "resources": {
      "@id": "spec:resources",
      "@container": "@set"
    },
    "distribution": "schema:distribution",
    "DataCatalog": { "@id": "schema:DataCatalog", "@type": "@id" },
    "DataDownload": { "@id": "schema:DataDownload", "@type": "@id" },
    "DataSet": { "@id": "schema:DataSet", "@type": "@id" }
  }
}

Having an @context we can expand our JSON so that:

  • Everything is self described, just follow the URLs of the keys,
  • Any client understanding JSON-LD know what properties can be dereferenced
{
  "@id": "http://registry.standardanalytics.io/mydpkg/0.0.0",
  "@type": "http://schema.org/DataCatalog",
  "http://schema.org/author": {
    "http://schema.org/name": "Sebastien Ballesteros",
    "http://xmlns.com/foaf/0.1/mbox": {
      "@id": "mailto:sebastien@standardanalytics.io"
    }
  },
  "http://schema.org/keywords": ["hyperdata", "json-ld", "test"],
  "http://schema.org/name": "mydpkg",
  "http://schema.org/version": "0.0.0",
  "http://schema.standardanalytics.io/dataDependencies": [{
    "@id": "http://registry.standardanalytics.io/dpkg1/0.0.1"
  }, {
    "@id": "http://registry.standardanalytics.io/dpkg2/0.2.1"
  }, {
    "@id": "http://data.com/"
  }],
  "http://schema.standardanalytics.io/resources": [{
    "@id": "http://registry.standardanalytics.io/mydpkg/0.0.0/resource0",
    "@type": "http://schema.org/DataSet",
    "http://schema.org/distribution": {
      "@type": "http://schema.org/DataDownload",
      "http://schema.org/contentSize": 1024,
      "http://schema.org/contentUrl": {
        "@id": "http://registry.standardanalytics.io/mydpkg/0.0.0/resource0/myurl"
      },
      "http://schema.org/encodingFormat": "txt"
    },
    "http://schema.org/name": "myurl",
    "http://schema.org/url": {
      "@id": "http://data.com"
    }
  }, {
    "@id": "http://registry.standardanalytics.io/mydpkg/0.0.0/resource1",
    "@type": "http://schema.org/DataSet",
    "http://schema.org/distribution": {
      "@type": "http://schema.org/DataDownload",
      "http://schema.org/contentSize": 1024,
      "http://schema.org/contentUrl": {
        "@id": "http://registry.standardanalytics.io/mydpkg/0.0.0/resource0/mycsv"
      },
      "http://schema.org/encodingFormat": "csv"
    },
    "http://schema.org/name": "mycsv"
  }, {
    "@id": "http://registry.standardanalytics.io/mydpkg/0.0.0/resource2",
    "@type": "http://schema.org/DataSet",
    "http://schema.org/distribution": {
      "@type": "http://schema.org/DataDownload",
      "http://schema.org/contentSize": 1024,
      "http://schema.org/contentUrl": {
        "@id": "http://registry.standardanalytics.io/mydpkg/0.0.0/resource0/mydata"
      },
      "http://schema.org/encodingFormat": "json"
    },
    "http://schema.org/name": "mydata",
    "http://schema.standardanalytics.io/data": {}
  }]
}

All good but there is more:

  • Everything can now be converted to RDFa 1.1 Lite for instance and used in plain HTML (with search engines blessing)!
  • Extra @context can be added (for instance to the csv resources to add semantic meaning to the column names)...

linked data FTW!

This is difficult to parse. Isn't there a way where your keys don't need the full URL?

@besquared not sure to understand.
But you have:
http://json-ld.org/spec/latest/json-ld/#interpreting-json-as-json-ld

or:

{
  "@context": "http://example.com/context.jsonld",
  "key": "key doesn't need to be expanded"
}

or (if you end up with smtg already expanded):
http://json-ld.org/spec/latest/json-ld/#compacted-document-form

edit: just found that re-reading the spec:

JSON-LD's media type defines a profile parameter which can be used to signal or request compacted document form. The profile URI identifying compacted document form is http://www.w3.org/ns/json-ld#compacted.

To be clear, here is a compacted form of datapackage.jsonld:

{
  "@context": "http://schema.standardanalytics.io/datapacakge/context.jsonld",
  "@id": "mydpkg/0.0.0",
  "@type": "DataCatalog",  
  "name": "mydpkg",
  "version": "0.0.0",
  "keywords": ["hyperdata", "json-ld", "test"],
  "dataDependencies": ["dpkg1/0.0.1", "dpkg2/0.2.1", "http://data.com/"],
  "author": {
    "name": "Sebastien Ballesteros",
    "email": "sebastien@standardanalytics.io"
  },
  "resources": [
    {
      "@id": "mydpkg/0.0.0/resource0",
      "@type": "DataSet",
      "name": "myurl",
      "url": "http://data.com",
      "distribution": {
        "@type" :"DataDownload",
        "contentUrl" :"/mydpkg/0.0.0/resource0/myurl",
        "contentSize" : 1024,
        "encodingFormat": "txt"
      }
    },
    {
      "@id": "mydpkg/0.0.0/resource1",
      "@type": "DataSet",
      "name": "mycsv",
      "path": "data/data.csv",
      "distribution": {
        "@type" :"DataDownload",
        "contentUrl" :"/mydpkg/0.0.0/resource0/mycsv",
        "contentSize" : 1024,
        "encodingFormat": "csv"
      }
    },
    {
      "@id": "mydpkg/0.0.0/resource2",
      "@type": "DataSet",
      "name": "mydata",
      "data": { "inline": "data" },
      "distribution": {
        "@type" :"DataDownload",
        "contentUrl" :"/mydpkg/0.0.0/resource0/mydata",
        "contentSize" : 1024,
        "encodingFormat": "json"
      }
    }
  ]
}

I think this is a pretty nice format.. I guess I'm still not sure what kinds of things people do with JSON-LD although I understand in principle why it might be valuable. This seems like something that could be offered optionally for data packages. More about getting support from data package toolbuilders.