Consider using a hypermedia format

Question

Consider using a hypermedia format

besquared opened this issue 11 years ago · comments

Have you guys considered using something like HAL or JSON-API instead of a bespoke JSON response? I think you're only getting part of REST without something like this.

Sebastien Ballesteros · Answer 1 · Mon Dec 23 2013 21:41:36 GMT+0800 (China Standard Time)

Yes definitely. We were thinking of using Hydra as we already use JSON-LD but this is still open to discussion.

Josh Ferguson · Answer 2 · Wed Dec 25 2013 12:26:42 GMT+0800 (China Standard Time)

I don't know that much about JSON-LD or Hydra TBH. They seems pretty verbose but I'm sure they're more complete than the simpler non-rdf related formats. I suspect without more information I can't really say either way. TIME TO GO LEARN.

Sebastien Ballesteros · Answer 3 · Fri Jan 03 2014 11:07:48 GMT+0800 (China Standard Time)

Ok, here is what I think I am going to do for JSON-LD support (note that it involves departing from commonJS for dependencies (a.k.a dataDependencies of the data package protocol) and also ignoring some parts of the spec in favor of schema.org.

vanilla datapackage.json:

{
  "name": "mydpkg",
  "version": "0.0.0",
  "keywords": ["hyperdata", "json-ld", "test"],
  "dataDependencies": ["dpkg1/0.0.1", "dpkg2/0.2.1", "http://data.com/"],
  "author": {
    "name": "Sebastien Ballesteros",
    "email": "sebastien@standardanalytics.io"
  },
  "resources": [
    {
      "name": "myurl",
      "url": "http://data.com"
    },
    {
      "name": "mycsv",
      "path": "data/data.csv"
    },
    {
      "name": "mydata",
      "data": { "inline": "data" }
    }
  ]
}

What I dislike about it:

no URL for the datapackage
no URLs for the dataDependencies (where can we get them ?)
no URLs for the resources
no URLs for the content of the resource (if this content is relocated or made available in different format)
not self described

still JSON but toward JSON-LD

Let's fix some of that with 2 new properties from JSON-LD:

@id to indicate missing URLs
@type to indicate classes (taking classes from schema.org/DataCatalog).

{
  "@id": "mydpkg/0.0.0",
  "@type": "DataCatalog",  
  "name": "mydpkg",
  "version": "0.0.0",
  "keywords": ["hyperdata", "json-ld", "test"],
  "dataDependencies": ["dpkg1/0.0.1", "dpkg2/0.2.1", "http://data.com/"],
  "author": {
    "name": "Sebastien Ballesteros",
    "email": "sebastien@standardanalytics.io"
  },
  "resources": [
    {
      "@id": "mydpkg/0.0.0/resource0",
      "@type": "DataSet",
      "name": "myurl",
      "url": "http://data.com",
      "distribution": {
        "@type" :"DataDownload",
        "contentUrl" :"/mydpkg/0.0.0/resource0/myurl",
        "contentSize" : 1024,
        "encodingFormat": "txt"
      }
    },
    {
      "@id": "mydpkg/0.0.0/resource1",
      "@type": "DataSet",
      "name": "mycsv",
      "path": "data/data.csv",
      "distribution": {
        "@type" :"DataDownload",
        "contentUrl" :"/mydpkg/0.0.0/resource0/mycsv",
        "contentSize" : 1024,
        "encodingFormat": "csv"
      }
    },
    {
      "@id": "mydpkg/0.0.0/resource2",
      "@type": "DataSet",
      "name": "mydata",
      "data": { "inline": "data" },
      "distribution": {
        "@type" :"DataDownload",
        "contentUrl" :"/mydpkg/0.0.0/resource0/mydata",
        "contentSize" : 1024,
        "encodingFormat": "json"
      }
    }
  ]
}

Still not great:

URLs indicated by @id have no fixed base!
a machine still doesn't know that dataDependencies is a list of URLs or that url is an URL
@type helps a bit but still far from being self documented (especially if I am a machine)...

JSON-LD

Let's add a JSON-LD @context to fix all of that (using the semantic from schema.org when possible or from http://schema.standardanalytics.io (not online yet) when not available on schema.org or elsewhere):

{
  "@context": {
    "spec": "http://schema.standardanalytics.io/",
    "schema": "http://schema.org/",
    "@base": "http://registry.standardanalytics.io/",
    "url": { "@id": "schema:url", "@type": "@id" },
    "contentUrl": { "@id": "schema:contentUrl", "@type": "@id" },
    "contentSize": "schema:contentSize",
    "encodingFormat": "schema:encodingFormat",
    "name": "schema:name",
    "version": "schema:version",
    "keywords": { "@id": "schema:keywords", "container": "@list" },
    "author": "schema:author",
    "email": {"@id": "http://xmlns.com/foaf/0.1/mbox", "@type": "@id"},    
    "dataDependencies": {
      "@id": "spec:dataDependencies",
      "@type": "@id",
      "@container": "@set"
    },
    "data": "spec:data",
    "resources": {
      "@id": "spec:resources",
      "@container": "@set"
    },
    "distribution": "schema:distribution",
    "DataCatalog": { "@id": "schema:DataCatalog", "@type": "@id" },
    "DataDownload": { "@id": "schema:DataDownload", "@type": "@id" },
    "DataSet": { "@id": "schema:DataSet", "@type": "@id" }
  }
}

Having an @context we can expand our JSON so that:

Everything is self described, just follow the URLs of the keys,
Any client understanding JSON-LD know what properties can be dereferenced

{
  "@id": "http://registry.standardanalytics.io/mydpkg/0.0.0",
  "@type": "http://schema.org/DataCatalog",
  "http://schema.org/author": {
    "http://schema.org/name": "Sebastien Ballesteros",
    "http://xmlns.com/foaf/0.1/mbox": {
      "@id": "mailto:sebastien@standardanalytics.io"
    }
  },
  "http://schema.org/keywords": ["hyperdata", "json-ld", "test"],
  "http://schema.org/name": "mydpkg",
  "http://schema.org/version": "0.0.0",
  "http://schema.standardanalytics.io/dataDependencies": [{
    "@id": "http://registry.standardanalytics.io/dpkg1/0.0.1"
  }, {
    "@id": "http://registry.standardanalytics.io/dpkg2/0.2.1"
  }, {
    "@id": "http://data.com/"
  }],
  "http://schema.standardanalytics.io/resources": [{
    "@id": "http://registry.standardanalytics.io/mydpkg/0.0.0/resource0",
    "@type": "http://schema.org/DataSet",
    "http://schema.org/distribution": {
      "@type": "http://schema.org/DataDownload",
      "http://schema.org/contentSize": 1024,
      "http://schema.org/contentUrl": {
        "@id": "http://registry.standardanalytics.io/mydpkg/0.0.0/resource0/myurl"
      },
      "http://schema.org/encodingFormat": "txt"
    },
    "http://schema.org/name": "myurl",
    "http://schema.org/url": {
      "@id": "http://data.com"
    }
  }, {
    "@id": "http://registry.standardanalytics.io/mydpkg/0.0.0/resource1",
    "@type": "http://schema.org/DataSet",
    "http://schema.org/distribution": {
      "@type": "http://schema.org/DataDownload",
      "http://schema.org/contentSize": 1024,
      "http://schema.org/contentUrl": {
        "@id": "http://registry.standardanalytics.io/mydpkg/0.0.0/resource0/mycsv"
      },
      "http://schema.org/encodingFormat": "csv"
    },
    "http://schema.org/name": "mycsv"
  }, {
    "@id": "http://registry.standardanalytics.io/mydpkg/0.0.0/resource2",
    "@type": "http://schema.org/DataSet",
    "http://schema.org/distribution": {
      "@type": "http://schema.org/DataDownload",
      "http://schema.org/contentSize": 1024,
      "http://schema.org/contentUrl": {
        "@id": "http://registry.standardanalytics.io/mydpkg/0.0.0/resource0/mydata"
      },
      "http://schema.org/encodingFormat": "json"
    },
    "http://schema.org/name": "mydata",
    "http://schema.standardanalytics.io/data": {}
  }]
}

All good but there is more:

Everything can now be converted to RDFa 1.1 Lite for instance and used in plain HTML (with search engines blessing)!
Extra @context can be added (for instance to the csv resources to add semantic meaning to the column names)...

linked data FTW!

Josh Ferguson · Answer 4 · Fri Jan 03 2014 12:50:01 GMT+0800 (China Standard Time)

This is difficult to parse. Isn't there a way where your keys don't need the full URL?

Sebastien Ballesteros · Answer 5 · Fri Jan 03 2014 12:54:43 GMT+0800 (China Standard Time)

@besquared not sure to understand.
But you have:
http://json-ld.org/spec/latest/json-ld/#interpreting-json-as-json-ld

or:

{
  "@context": "http://example.com/context.jsonld",
  "key": "key doesn't need to be expanded"
}

or (if you end up with smtg already expanded):
http://json-ld.org/spec/latest/json-ld/#compacted-document-form

edit: just found that re-reading the spec:

JSON-LD's media type defines a profile parameter which can be used to signal or request compacted document form. The profile URI identifying compacted document form is http://www.w3.org/ns/json-ld#compacted.

Sebastien Ballesteros · Answer 6 · Fri Jan 03 2014 13:45:10 GMT+0800 (China Standard Time)

To be clear, here is a compacted form of datapackage.jsonld:

{
  "@context": "http://schema.standardanalytics.io/datapacakge/context.jsonld",
  "@id": "mydpkg/0.0.0",
  "@type": "DataCatalog",  
  "name": "mydpkg",
  "version": "0.0.0",
  "keywords": ["hyperdata", "json-ld", "test"],
  "dataDependencies": ["dpkg1/0.0.1", "dpkg2/0.2.1", "http://data.com/"],
  "author": {
    "name": "Sebastien Ballesteros",
    "email": "sebastien@standardanalytics.io"
  },
  "resources": [
    {
      "@id": "mydpkg/0.0.0/resource0",
      "@type": "DataSet",
      "name": "myurl",
      "url": "http://data.com",
      "distribution": {
        "@type" :"DataDownload",
        "contentUrl" :"/mydpkg/0.0.0/resource0/myurl",
        "contentSize" : 1024,
        "encodingFormat": "txt"
      }
    },
    {
      "@id": "mydpkg/0.0.0/resource1",
      "@type": "DataSet",
      "name": "mycsv",
      "path": "data/data.csv",
      "distribution": {
        "@type" :"DataDownload",
        "contentUrl" :"/mydpkg/0.0.0/resource0/mycsv",
        "contentSize" : 1024,
        "encodingFormat": "csv"
      }
    },
    {
      "@id": "mydpkg/0.0.0/resource2",
      "@type": "DataSet",
      "name": "mydata",
      "data": { "inline": "data" },
      "distribution": {
        "@type" :"DataDownload",
        "contentUrl" :"/mydpkg/0.0.0/resource0/mydata",
        "contentSize" : 1024,
        "encodingFormat": "json"
      }
    }
  ]
}

Josh Ferguson · Answer 7 · Sat Jan 04 2014 14:30:00 GMT+0800 (China Standard Time)

I think this is a pretty nice format.. I guess I'm still not sure what kinds of things people do with JSON-LD although I understand in principle why it might be valuable. This seems like something that could be offered optionally for data packages. More about getting support from data package toolbuilders.