sballesteros / dcat

Archive and make discoverable data and links with metadata.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool


Archive and make discoverable data and links with metadata.


Usage (CLI)


dcat --help

Registering an User (adduser)


dcat adduser

and follow the prompting wizard.

Publishing (publish)

Simple document

dcat allows the publication of JSON-LD documents using context. This context extends with terms relevant to do I/O and preserve data integrity (like filePath and Checksum).

At the minimum, a document has to contain

  • a context (@context) set to,
  • an id (@id) to uniquely identify things published on with URLs. All relative URLs will be resolved with a base (defined in the context (@base)) of


  "@context": "",
  "@id": "mydoc"

To publish this document (mydoc), create a file named JSONLD and in the directory containing JSONLD run:

dcat publish

After publication the document will be available at

Documents can contain any properties from or from any other ontologies as long as the associated @context are provided.


If a version property is specified in the document, the document will be versioned, that is, each update will require a new version value in order to be published (this prevents existing versions from being overwritten).

When appropriate version number SHOULD follow semantic versioning


  "@context": "",
  "@id": "mydoc",
  "version": "0.0.1"

After publication this document will be available at whereas the latest version will always be available at

In case the document is versioned following Semantic Versioning, a range (e.g. <0.0.1) can be specified as version (e.g.<0.0.1)


Document can be arbitrarily complex (having multiple nodes) and sometimes, it makes sense to assign a URL to a node so that it can be referenced. This is achieved by setting @id properties to the desired nodes


  "@context": "",
  "@id": "mydoc",
  "version": "0.0.1",
  "hasPart": {
    "@id": "mydoc/data",
    "@type": "Dataset",
    "description": "a dataset part of the document"

The whole document can be retrieved at whereas the part (node) can be retrieved at

Note: nodes can be any valid URLs but they have to be namespaced within the top level @id (for a document of ""@id": "mydoc"", "@id": "mydoc/arbitrarily/long/pathname" will be valid whereas "@id": "part" won't).

Adding metadata to existing URLs

dcat can be used to add machine readable metadata to any resources already published on the web. For instance running:

dcat init

we get a basic machine readable document:

  "@context": "",
  "@id": "mydoc",
  "@type": "Code",
  "codeRepository": "",
  "encoding":  {
    "@type": "MediaObject",
     "contentUrl": "",
     "encodingFormat": "application/x-gzip",
     "contentSize": 690980

This document should be extended with more properties (from such as author, contributor, about, programmingLanguage, runtime..., or from any other web ontologies, taking care to add contexts in this case) to improve the discoverability and reusability of the resource.

Note, in addition to absolute URLs, dcat supports CURIE for the prefixes defined in the @context. Using a CURIE, the previous is simplified to:

dcat init github:standard-analytics/dcat.git

Files (raw data)

For all the subclasses of (e.g Dataset, Code, SoftwareApplication, Article, Book, ImageObject, VideoObject, AudioObject, ...) dcat allows the publication of raw data from files (including datasets, binaries, images, media, and more...) along with documents.

For instance if you have an a PDF of a MedicalScholarlyArticle and an associated Dataset in CSV you can run:

dcat init --main article.pdf::MedicalScholarlyArticle --part data.csv

Note: ::MedicalScholarlyArticle associates a type (@type) with the resource (article.pdf).

This will generate a machine readable document (JSONLD) that you can edit to provide additional metadata.

  "@context": "",
  "@id": "mydoc",
  "@type": "MedicalScholarlyArticle",
  "encoding": {
    "@type": "MediaObject",
    "filePath": "article.pdf"
  "hasPart": {
    "@type": "Dataset",
    "distribution": {
      "@type": "DataDownload",
      "filePath": "data.csv"

After publication (dcat publish) the document will acquire additional URL properties that can be dereferenced to retrieved the original raw data:

  "@context": "",
  "@id": "mydoc",
  "@type": "MedicalScholarlyArticle",
  "encoding": {
    "@type": "MediaObject",
    "filePath": "article.pdf",
    "contentUrl": "" //generated URL
  "hasPart": {
    "@type": "Dataset",
    "distribution": {
      "@type": "DataDownload",
      "filePath": "data.csv",
      "contentUrl": "" //generated URL

Note: dcat init supports globbing so you can run commands like:

dcat init --main article.pdf --part *.csv

or repeat --part (or the shorter -p) if you need more complex matching e.g.

dcat init --m article.pdf -p *.csv -p *.jpg


Directories are published as tarballs. For instance, running

dcat init -m src::Code --id cproject

where src is a directory of source files

├── lib.h
└── main.c

will generate:

  "@context": "",
  "@id": "cproject",
  "@type": "Code",
  "programmingLanguage": { "name": "c" },
  "encoding": {
    "@type": "MediaObject",
    "encodingFormat": "application/x-gtar",
    "hasPart": [
      { "@type": "MediaObject", "filePath": "src/lib.h" },
      { "@type": "MediaObject", "filePath": "src/main.c" }

After publication, the MediaObject will have a contentUrl property indicating where the tarball can be retrieved.

Unpublishing (unpublish)

To delete a specific version of a document of "@id": "mydoc" run:

dcat unpublish ldr:mydoc?version=0.1.1

ldr is the prefix used for (defined in the @context).

To delete all versions of a document of "@id": "mydoc" run:

dcat unpublish ldr:mydoc

Retrieving documents and raw data (search, show, clone)


Document containing keywords, name or description properties can be searched by keyword with dcat search followed by a list of keywords.

For more powerful search, all data published on are valid linked data fragments and can be queried using SPARQL.

Show (expanded, compacted, flattened, normalized )

dcat show followed by a CURIE displays the latest JSON-LD document corresponding to the CURIE on stdout.

Different options (-e, --expand, -f, --flatten, -c, --compact, -n, --normalize) provide alternative representations of the document. For instance,

dcat show ldr:mydoc?version=<2.1.0 --normalize

will serialize the latest version smaller than 2.1.0 of the document of "@id": "mydoc" to N-Quads (RDF).


dcat clone followed by a CURIE downloads the raw data associated with a document and stores them along with the document on disk at the paths specified by the filePath properties.

Listing / Adding / Removing maintainers (maintainer)

Only maintainers of a document can publish or remove versions of a document. Maintainers of a document can be listed with:

dcat maintainer ls <CURIE>

Maintainers can give users maintainer rights by running:

dcat maintainer add <user CURIE> <doc CURIE>

Note: all user of have a CURIE of ldr:users/{username}

Maintainers can remove maintainer rights by running:

dcat maintainer rm <user CURIE> <doc CURIE>


dcat can also be used programmatically.

var Dcat = require('dcat');
var dcat = new Dcat();

var doc = {
  '@context': ',
  '@id': 'test',
  name: 'hello world'

dcat.publish(doc, function(err, cdoc){
  console.log(err, cdoc); //cdoc is compacted

See test/test.js for more examples.


package.json -> datapackage.json -> package.jsonld -> JSON-LD + + hydra + linked data fragment.


By default, dcat uses, a linked data registry hosted on cloudant.


You need a local instance of the linked data registry running on your machine on port 3000. Then, run:

npm test




Archive and make discoverable data and links with metadata.

License:Apache License 2.0


Language:JavaScript 100.0%