Swirrl / table2qb

A generic pipeline for converting tabular data into rdf data cubes

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Some rdfs:label do not have a language tag

zeginis opened this issue · comments

A language tag (e.g. '@en') does not exist at:

  • the qb:DimensionProperty example
  • the qb:MeasureProperty example
  • the members of a skos:ConceptScheme example

The system has no knowledge of language, it doesn't know what language is being used in the strings it receives.

This might be a bit tricky to resolve on a per-cell basis as there's not really any way to declare this within a csv file. I suppose we could add a language column for each string-valued column but that doesn't seem very satisfactory (as this is metadata not data).

The csvw:lang property can be applied to a column, table or table-group. We could extend table2qb to accept optional csvw metadata about the input csv files. This would be a reasonably large change.

A simpler change might be to make this an application-level setting, being applied to all string literals sent to csv2rdf.

The problem is that the DataSet label (and some other labels) have a language tag, e.g.
at https://github.com/Swirrl/table2qb/tree/master/examples/employment/ttl :

<http://statistics.gov.scot/data/employment> a <http://www.w3.org/ns/csvw#Table> ;
	<http://www.w3.org/ns/csvw#url> <file:/var/folders/dr/0rl25prn4jqc59p92w22qj6w0000gp/T/component-specifications595063391289823660.csv> ;
	dcterms:title "Employment"@en ;
	rdfs:label "Employment"@en ;
	<http://www.w3.org/ns/csvw#row> _:row183 .

<http://statistics.gov.scot/data/employment> a qb:DataSet .

This causes compatibility issues with CubiQL beacause we use an application level setting to define the language. So CubiQL requires:

  • either all labels to have a language tag
  • or all labels to have no language tag

Ah I see. Indeed the system does know about language insofar as the json-ld statements in the csvw metadata are concerned. Thanks for the clarification.

Some of these strings are set by the incoming csv or pipeline parameters (which could be subject to application-level config) but others - e.g. "Components Ontology" - are hard coded. That would complicate internationalisation (as you'd need to provide translations for all of the internal strings to do this comprehensively). We could read this from a static translations resource.

Perhaps we should just set everything to English for now, then return to a proper internationalisation later. Would that resolve your immediate issue or do you need to use a different language?

@zeginis - Is this still an issue after Swirrl/cubiql#112 ? CubiQL falls back to using strings without a language if none is available with the configured language. Or does table2qb generate strings with the @en tag, which could differ from the configured language? In that case we could change the fallback behaviour to be configured language -> en -> no language. Would that work for you?

@lkitching the language fall back configured language -> no language is ok. I have reported a bug at CubiQL language fall back Swirrl/cubiql#128