Swirrl / table2qb

A generic pipeline for converting tabular data into rdf data cubes

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Give every resource it's own table

Robsteranium opened this issue · comments

We're currently quite succinct with our declaration. In the codelist pipeline for example, rather than generating a single-row csvw table for the concept scheme resource, we create that resource with annotations in the metadata for the table of concepts.

This isn't necessarily wrong, but it can lead to complications. It might be cleaner to create a table for each resource, even if this might mean creating lots of tables/ files with just a single row in each.

One example of a complication is that minimal mode of csv2rdf doesn't include annotations or notes - meaning we need to run in standard mode to get all of the resources. This mode is quite verbose (including a lot of the csvw auditing csvw:Row etc descriptions of the input file provenance), thus we've found ourselves needing an intermediate level between the two (#85).

Re-using the components table to generate the DSD, leading to duplicate statements in the output is another example of this (#64).

These consequences aren't themselves critical, but they wouldn't exist if we had a (tidier) one-resource-per-row approach in the first place. I'm creating this issue now to document this, in the hope that we might realise when we're headed towards other "second best" solutions, and potentially correct this instead of implementing those.

An alternative approach would be to provide the DataSet, DSD and ComponentSpecifications as json-ld annotations - as per the Cambourne weather data example provided by authors of the csvw spec.

We might not want to include definitions of the component-properties as these are built with the components-pipeline (i.e. we could just point to their URIs in the component-specifications).