Swirrl / table2qb

A generic pipeline for converting tabular data into rdf data cubes

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

FileNotFoundException

zeginis opened this issue · comments

I run the components-pipeline:

java -jar table2qb.jar exec components-pipeline --input-csv components.csv --base-uri http://statistics.gov.scot/ --output-file components.ttl

the components.csv is: https://github.com/Swirrl/table2qb/blob/master/examples/employment/csv/components.csv

And get a FileNotFoundException:

C:\reusetable2qbdatawithcubiql>java -jar table2qb.jar exec components-pipeline --input-csv components.csv --base-uri http://statistics.gov.scot/ --output-file components.ttl
At path ["url"]: Illegal character in opaque part at index 2: C:\Users\user\AppData\Local\Temp\components5371379411295667911.csv
C:\reusetable2qbdatawithcubiql\meta.json (─ίΊ ί▀Ίάώ ϊΫΊάΪ▐ ύ ί²±ί≤ύ ΪΎΫ ΆάϋΎ±ώ≤Ή▌ΊΎΫ ά±≈ί▀ΎΫ άΏⁿ ΪΎ ≤²≤ΪύΉά)
java.io.FileNotFoundException: C:\reusetable2qbdatawithcubiql\meta.json 
        at java.io.FileInputStream.open0(Native Method)
        at java.io.FileInputStream.open(Unknown Source)
        at java.io.FileInputStream.<init>(Unknown Source)
        at clojure.java.io$fn__10972.invokeStatic(io.clj:238)
        at clojure.java.io$fn__10972.invoke(io.clj:235)
        at clojure.java.io$fn__10881$G__10874__10888.invoke(io.clj:69)
        at clojure.java.io$fn__10976.invokeStatic(io.clj:248)
        at clojure.java.io$fn__10976.invoke(io.clj:248)
        at clojure.java.io$fn__10881$G__10874__10888.invoke(io.clj:69)
        at clojure.java.io$input_stream.invokeStatic(io.clj:136)
        at clojure.java.io$input_stream.doInvoke(io.clj:121)
        at clojure.lang.RestFn.invoke(RestFn.java:410)
        at csv2rdf.source$fn__4969.invokeStatic(source.clj:73)
        at csv2rdf.source$fn__4969.invoke(source.clj:72)
        at clojure.lang.MultiFn.invoke(MultiFn.java:229)
        at csv2rdf.source$fn__4973.invokeStatic(source.clj:81)
        at csv2rdf.source$fn__4973.invoke(source.clj:75)
        at csv2rdf.source$fn__4949$G__4944__4954.invoke(source.clj:54)
        at csv2rdf.tabular.csv.reader$read_tabular_source.invokeStatic(reader.clj:203)
        at csv2rdf.tabular.csv.reader$read_tabular_source.invoke(reader.clj:202)
        at csv2rdf.tabular.csv.reader$read_rows.invokeStatic(reader.clj:228)
        at csv2rdf.tabular.csv.reader$read_rows.invoke(reader.clj:222)
        at csv2rdf.tabular.csv$extract_embedded_metadata.invokeStatic(csv.clj:89)
        at csv2rdf.tabular.csv$extract_embedded_metadata.invoke(csv.clj:83)
        at csv2rdf.tabular.processing$validate_merge_table.invokeStatic(processing.clj:15)
        at csv2rdf.tabular.processing$validate_merge_table.invoke(processing.clj:14)
        at csv2rdf.tabular.processing$get_metadata$fn__7853.invoke(processing.clj:28)
        at clojure.core$mapv$fn__8088.invoke(core.clj:6832)
        at clojure.lang.PersistentVector.reduce(PersistentVector.java:341)
        at clojure.core$reduce.invokeStatic(core.clj:6747)
        at clojure.core$mapv.invokeStatic(core.clj:6823)
        at clojure.core$mapv.invoke(core.clj:6823)
        at csv2rdf.tabular.processing$get_metadata.invokeStatic(processing.clj:28)
        at csv2rdf.tabular.processing$get_metadata.invoke(processing.clj:20)
        at csv2rdf.csvw$csv__GT_rdf.invokeStatic(csvw.clj:27)
        at csv2rdf.csvw$csv__GT_rdf.invoke(csvw.clj:18)
        at table2qb.core$components__GT_csvw__GT_rdf.invokeStatic(core.clj:507)
        at table2qb.core$components__GT_csvw__GT_rdf.invoke(core.clj:502)
        at table2qb.core$components_pipeline.invokeStatic(core.clj:514)
        at table2qb.core$components_pipeline.invoke(core.clj:509)
        at clojure.lang.AFn.applyToHelper(AFn.java:156)
        at clojure.lang.AFn.applyTo(AFn.java:144)
        at clojure.lang.Var.applyTo(Var.java:702)
        at clojure.core$apply.invokeStatic(core.clj:657)
        at clojure.core$apply.invoke(core.clj:652)
        at table2qb.main$exec_pipeline$fn__8604.invoke(main.clj:144)
        at table2qb.main$exec_pipeline.invokeStatic(main.clj:142)
        at table2qb.main$exec_pipeline.invoke(main.clj:140)
        at table2qb.main$fn__8615.invokeStatic(main.clj:171)
        at table2qb.main$fn__8615.invoke(main.clj:159)
        at clojure.lang.MultiFn.invoke(MultiFn.java:238)
        at table2qb.main$inner_main.invokeStatic(main.clj:187)
        at table2qb.main$inner_main.invoke(main.clj:180)
        at table2qb.main$_main.invokeStatic(main.clj:198)
        at table2qb.main$_main.doInvoke(main.clj:197)
        at clojure.lang.RestFn.applyTo(RestFn.java:137)
        at table2qb.main.main(Unknown Source)

The content of the components5371379411295667911.csv:

label,description,component_type,codelist,notation,component_type_slug,property_slug,class_slug,parent_property
Gender,The state of being male or female,qb:DimensionProperty,http://statistics.gov.scot/def/concept-scheme/gender,gender,dimension,gender,Gender,
Count,Total number of items,qb:MeasureProperty,,count,measure,count,Count,http://purl.org/linked-data/sdmx/2009/measure#obsValue

My operating system is Windows. Does this matter?

I get a similar exception also at the codelist-pipeline and the cube-pipeline

This appears to be related to the way windows paths are converted into URIs. table2qb constructs the metadata file URI by converting the tabular file to a URI and resolving the metadata URI relative to that URI. Windows paths encode the \s in the path resulting in a single path segments which is removed by the resolution e.g.

(.toURI (io/file "C:\\Users\\user\\AppData\\Local\\Temp\\components5371379411295667911.csv"))
=> "file:/Users/lee/src/table2qb/C:%5CUsers%5Cuser%5CAppData%5CLocal%5CTemp%5Ccomponents5371379411295667911.csv"

resolving against this URI results in:

 file:/Users/lee/src/table2qb/meta.json

The URI should refer to a file in the same directory as the temporary tabular file (i.e. C:\Users\user\AppData\Local\Temp\meta.json).

It may be possible to change the way the relative metadata file URI is generated to first resolve the metadata path before converting to a URI.

@lkitching I found a Linux machine to work. So don't give priority at this issues. It is not blocking me.

Am having the same issue, but for me don't have access to a Linux machine atm.

@lkitching is it easy to fix this issues?

@keeganmcbride also Mac work. Can you find access to Mac?

@zeginis @keeganmcbride - I've pushed a fix to the branch issue_77. Can you let me know if this fixes the issue for you?

@lkitching this seems to work. I pulled the branch and recompiled and now able to generate files correctly.

Thanks all.