Swirrl / table2qb

A generic pipeline for converting tabular data into rdf data cubes

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Make use of csvw validation features

Robsteranium opened this issue · comments

The csvw spec allows for some validation. You can, for example, include a "required" key for the relevant columns in a tableSchema.

We could look to adopt this in a couple of ways.

  1. We could make the output csvw stricter - including "required" keys for certain fields. This might allow us to catch some errors in csv2rdf. It might also help users to edit the csvw output from table2qb safely.
  2. We might look to specify table2qb's input requirements in csvw. The advantage of doing this is that the validation spec becomes an artifact in it's own right - it would serve as executable documentation, allowing users to check the validity of their input from other tools (indeed ONS had been creating their own csvlint specs for this purpose).

The first should be straightforward, the second is a little trickier. We introduce a csv parser in #102 which includes its own specification for input validity. It also involves some custom validating functions, transformations and defaults - these features probably aren't available as part of csvw's validation. One possible way to get the same benefits without necessarily adopting the standard would be to offer a validation task i.e. one that just checked the inputs and didn't transform the data.