oeg-upm / yatter

Translate YARRRML into easy-to-read [R2]RML mappings

Home Page:https://doi.org/10.5281/zenodo.7024500

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

XLSX source is not supported

neobernad opened this issue · comments

Hi @dachafra,

Opening up a new issue as suggested in #77.

I have the following YARRRML file (actually it is longer and I have anonymized it):

prefixes:
  ds_data: https://example.com/data/MyData/
  ds_property: https://example.com/MyProperty/
  rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
  rdfs: http://www.w3.org/2000/01/rdf-schema#
  grel: http://users.ugent.be/~bjdmeest/function/grel.ttl#
  morph-kgc: https://github.com/morph-kgc/morph-kgc/function/built-in.ttl#

sources:
  MyDataSource:
    - data.xlsx

mappings:
  PersonMappings:
    sources: MyDataSource

    s: ds_data:$(person_name)
    po:
      - [rdf:type, ds_property:Person]
      - [rdfs:label, $(person_name), xsd:string]

The execution of the translation of the file above raises an exception ERROR: The YARRRML mapping has not been translated:

yaml = YAML(typ='safe', pure=True)
rml_content = yatter.translate(yaml.load(open("mappings.yml")))

Here, rml_content is None.

What could be wrong? After having a quick look debugging yatter, it seems that the YARRRML file is being properly opened and parsed, but something in the translate or get_non_asserted_mappings methods is not liking the structure of the mappings perhaps?

Thanks,
José Antonio

Hi José Antonio,

Your mapping is missing the referenceFormulation, I understand that is CSV right?(https://rml.io/yarrrml/spec/#reference-formulation)

prefixes:
  ds_data: https://example.com/data/MyData/
  ds_property: https://example.com/MyProperty/
  rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
  rdfs: http://www.w3.org/2000/01/rdf-schema#
  grel: http://users.ugent.be/~bjdmeest/function/grel.ttl#
  morph-kgc: https://github.com/morph-kgc/morph-kgc/function/built-in.ttl#

sources:
  MyDataSource:
    - data.xlsx~csv

mappings:
  PersonMappings:
    sources: MyDataSource

    s: ds_data:$(person_name)
    po:
      - [rdf:type, ds_property:Person]
      - [rdfs:label, $(person_name), xsd:string]

Hi @neobernad, is this what you expect from the output?

@prefix ds_data: <https://example.com/data/MyData/>.
@prefix ds_property: <https://example.com/MyProperty/>.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix grel: <http://users.ugent.be/~bjdmeest/function/grel.ttl#>.
@prefix morph-kgc: <https://github.com/morph-kgc/morph-kgc/function/built-in.ttl#>.
@prefix rr: <http://www.w3.org/ns/r2rml#>.
@prefix rml: <http://semweb.mmlab.be/ns/rml#>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
@prefix ql: <http://semweb.mmlab.be/ns/ql#>.
@prefix d2rq: <http://www.wiwiss.fu-berlin.de/suhl/bizer/D2RQ/0.1#>.
@prefix foaf: <http://xmlns.com/foaf/0.1/>.
@prefix schema: <http://schema.org/>.
@prefix formats: <http://www.w3.org/ns/formats/>.
@prefix comp: <http://semweb.mmlab.be/ns/rml-compression#>.
@prefix void: <http://rdfs.org/ns/void#>.
@prefix fnml: <http://semweb.mmlab.be/ns/fnml#>.
@base <http://example.com/ns#>.


<PersonMappings_0> a rr:TriplesMap;

	rml:logicalSource [
		a rml:LogicalSource;
		rml:source "data.xlsx";
		rml:referenceFormulation ql:CSV
	];
	rr:subjectMap [
		a rr:SubjectMap;
		rr:template "https://example.com/data/MyData/{person_name}";
	];
	rr:predicateObjectMap [
		rr:predicateMap [
			a rr:PredicateMap;
			rr:constant rdf:type;
		];
		rr:objectMap [
			a rr:ObjectMap;
			rr:constant ds_property:Person;
		];
	];
	rr:predicateObjectMap [
		rr:predicateMap [
			a rr:PredicateMap;
			rr:constant rdfs:label;
		];
		rr:objectMap [
			a rr:ObjectMap;
			rml:reference "person_name";
			rr:datatype xsd:string
		];
	].

Hi @dachafra,

Firstly, thanks for the quick response :-).

I forgot to mention that I removed the 'referenceFormulation' which formerly was xlsx as it triggered an error.

The output you suggest it is something I would expect! Shall I transform any Excel file into a CSV then?

If you want to use Excel referenceFormulation you need to use its engine and extension: https://www.dfki.uni-kl.de/~mschroeder/demo/excel-rml/. But I guess you want to use an XLSX file without transforming it to CSV but the behavior would be the same, i.e. the expected parsing of the file is per row similar as it would be a CSV (that is what the referenceFormulation means). If this output is what you expect, it's already solved and I'll push the changes

Now, XLSX is supported with CSV referenceFormulation.

Sorry for the late response, I could not on this topic until now.

I have tested it with the latest version of the repository and with CSV referenceFormulation. It works like a charm, thank you! :-)