Swirrl / table2qb

A generic pipeline for converting tabular data into rdf data cubes

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Not possible to create labels in multiple languages

BillSwirrl opened this issue · comments

From Paul Hermans: in Belgium, where there are 3 official languages, it is often necessary to provide multiple labels for concepts, dimension properties etc. table2qb currently only allows creation of a single label per generated concept.

It looks like csv2rdf gives us 3 options for setting the language - a global default, a per column setting, and a content-type header.

The default would require us to apply change each json metadata block, e.g.:

{ "@context" ["http://www.w3.org/ns/csvw" {"@language" "nl"}], ... }

A column setting would require us to add a this to the column spec, e.g.:

{ "tableSchema"
  { "columns"
    [{"name" "label",
      "titles" "label",
      "datatype" "string",
      "lang" "nl",
      "propertyUrl" "rdfs:label"}, ... ]}}

To provide a Content-Type header on the data we'd need to implement #59 and ensure that these headers were retained from the response the user provided to table2qb's csv request, to the response table2qb provided to csv2rdf's csvw request.

The csvw-metadata default would be the simplest option.

We could implement this by introducing a global language setting for table2qb. This would mean re-initialisation would be required to swap languages. That would be fine for the command-line mode, but where table2qb is provided as a service (e.g. via grafter-server) a nicer alternative would be to make this a pipeline-parameter, so that it could be changed at run-time.

A csvw-metadata default would only allow the creation of non-English lang-strings though. In order to provide multiple translations for the same URI we'd need to first implement #69 otherwise the inputs "Vrouw"@nl/ "Femelle"@fr/ "Weiblich"@de would lead to 3 different concepts.

The csvw:lang attribute would allow us to add more label columns (one per language, potentially only using the English version in the URI, avoiding the dependency on #69) or a label-language column but this breaks 3NF. Indeed we'd need to do the same for description and any other future string-literal columns.

I suggest we read the LANG environmental variable on initialisation (falling back to English) and then set this as the default language for the csvw metadata blocks.

Would that work for you @paulzh (assuming we'd also resolved #69)?