jazzyray / rdf2rml

RDF by Example: rdfpuml for True RDF Diagrams, rdf2rml for R2RML Generation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

RDF by Example: rdfpuml for True RDF Diagrams, rdf2rml for R2RML Generation

1 Introduction

See this presentation:

RDF by Example: rdfpuml for True RDF Diagrams, rdf2rml for R2RML Generation Alexiev, V. In Semantic Web in Libraries 2016 (SWIB 16), Bonn, Germany, November 2016. Presentation, HTML, PDF, Video

RDF is a graph data model, so the best way to understand RDF data schemas (ontologies, application profiles, RDF shapes) is with a diagram. Many RDF visualization tools exist, but they either focus on large graphs (where the details are not easily visible), or the visualization results are not satisfactory, or manual tweaking of the diagrams is required.

We describe a tool rdfpuml that makes true diagrams directly from Turtle examples using PlantUML and GraphViz. Diagram readability is of prime concern, and rdfpuml introduces various diagram control mechanisms using triples in the puml: namespace. Special attention is paid to inlining and visualizing various Reification mechanisms (described with PRV). We give examples from Getty CONA, Getty Museum, AAC (mappings of museum data to CIDOC CRM), Multisensor (NIF and FrameNet), EHRI (Holocaust Research into Jewish social networks), Duraspace (Portland Common Data Model for holding metadata in institutional repositories), Video annotation.

If the example instances include SQL queries and embedded field names, they can describe a mapping precisely. Another tool rdf2rdb generates R2RML transformations from such examples, saving about 15x in complexity.

See http://twitter.com/hashtag/rdfpuml for news, diagrams and announcements.

2 Documentation

Source: ./doc/rdfpuml.pod, ./doc/rdf2rml.pod

3 Installation

Checkout this repo and add rdf2rml/bin to your path. Install the following prerequisites:

  • both tools: Perl. Tested with version 5.22 on Windows (cygwin and Strawberry).
  • rdfpuml:
    • GraphViz
    • PlantUML. You need a recent version for new features like arrow length and color. I’m currently running 1.2018.10beta7. See in particular plantuml class diagrams.
    • Perl modules: use cpan or cpanm to install them: RDF::Trine RDF::Query Encode FindBin Carp::Always Slurp
    • RDF::Prefixes::Curie. This is my own module located in ./lib, and rdfpuml needs FindBin to locate it.
  • rdf2rml:
    • Apache Jena: riot, update. Tested with version 3.1.0 of 2016-05-10.
    • cat, grep, rm

4 Evolution

Help needed for the following tasks. Post bugs and enhancement requests to this repo!

Jonas Smedegaard (@jonassmedegaard, dr at jones fullstop dk) has volunteered, so changes should happen soon. Development is at https://salsa.debian.org/debian/rdf2rml/branches.

To adopt changes, do something like this. Tp merge all commits in the salsa/develop branch:

cd rdf2rml    # i.e. your local clone of your Github project
git remote add salsa https://salsa.debian.org/debian/rdf2rml.git
git fetch salsa
git merge salsa/develop

To adopt only single commits from my develop branch, instead do something this instead (replacing final command above; “git remote” is needed only initially):

 git cherry-pick $commit1 $commit2 $commit3

4.1 Done

4.2 Near-term

  • Modularize and package better.
  • Release on CPAN
  • Add Unicode tests (ttl with non-ASCII chars: Cyrillic, French, etc)
  • Eliminate the dependency of rdfpuml on ./lib/RDF/Prefixes/Curie.pm once perlrdf#131 is fixed

4.2.1 Batch Processing

#1: Batch a number of ttl files to one puml file. Rationale: plantuml is slow to start up, so putting several diagrams in one file will make things faster:

@startuml file1.png
  # made from file1.ttl
@enduml
@startuml file2.png
  # made from file2.ttl
@enduml
  • However, this interferes with make processing that regenerates only png for changed ttl files. So we need a smarter outer script or Makefile that batches up only the changed ttl for processing.
  • rdfpuml should take multiple input files, and write a single output
  • It would also be useful to take a whole folder of ttl tiles as input

4.3 Mid-Term

  • Upgrade to use Attean instead of Trine (Perl RDF)
  • Integrate in Emacs org-mode: write Turtle, see diagram (easy to do)
  • Enahnce rdfpuml to allow node colors, icons and tooltips (see ./ideas)
  • Ability to describe custom reification situations using the Property Reification Vocabulary (PRV)

4.4 Long-Term

  • Extend rdf2rml to describe & generate RDF Shapes
  • Another tool to visualize RDF Shapes (SHACL and ShEx)
  • R2RML works great for RDBMS, but how about other sources? Extend rdf2rml to generate:
    • RML: extends R2RML to handle RDB, XML, JSON, CSV
    • XSPARQL: extends XQuery with SPARQL construct and JSON input
    • tarql: handles TSV/CSV with SPARQL construct

5 Citation

If you use this software, please cite it

RDF by Example: rdfpuml for True RDF Diagrams, rdf2rml for R2RML Generation. Alexiev, V. In Semantic Web in Libraries 2016 (SWIB 16), Bonn, Germany, November 2016. Presentation, HTML, PDF, Video.

@InProceedings{Alexiev-rdfpuml-rdf2rml,
  author       = {Vladimir Alexiev},
  title        = {{RDF by Example: rdfpuml for True RDF Diagrams, rdf2rml for R2RML Generation}},
  booktitle    = {Semantic Web in Libraries 2016 (SWIB 16)},
  year         = 2016,
  month        = nov,
  address      = {Bonn, Germany},
  url_Slides   = {http://rawgit2.com/VladimirAlexiev/my/master/pres/20161128-rdfpuml-rdf2rml/index.html},
  url_HTML     = {http://rawgit2.com/VladimirAlexiev/my/master/pres/20161128-rdfpuml-rdf2rml/index-full.html},
  keywords     = {RDF, visualization, PlantUML, cultural heritage, NLP, NIF, EHRI, R2RML, generation, model-driven, RDF by Example, rdfpuml, rdf2rml},
  url_PDF      = {http://rawgit2.com/VladimirAlexiev/my/master/pres/20161128-rdfpuml-rdf2rml/RDF_by_Example.pdf}, 
  url_Video    = {https://youtu.be/4WoYlaGF6DE},
  type         = {presentation},
  abstract     = {RDF is a graph data model, so the best way to understand RDF data schemas (ontologies, application profiles, RDF shapes) is with a diagram. Many RDF visualization tools exist, but they either focus on large graphs (where the details are not easily visible), or the visualization results are not satisfactory, or manual tweaking of the diagrams is required. We describe a tool *rdfpuml* that makes true diagrams directly from Turtle examples using PlantUML and GraphViz. Diagram readability is of prime concern, and rdfpuml introduces various diagram control mechanisms using triples in the puml: namespace. Special attention is paid to inlining and visualizing various Reification mechanisms (described with PRV). We give examples from Getty CONA, Getty Museum, AAC (mappings of museum data to CIDOC CRM), Multisensor (NIF and FrameNet), EHRI (Holocaust Research into Jewish social networks), Duraspace (Portland Common Data Model for holding metadata in institutional repositories), Video annotation. If the example instances include SQL queries and embedded field names, they can describe a mapping precisely. Another tool *rdf2rdb* generates R2RML transformations from such examples, saving about 15x in complexity.},
}

6 Related Work

About

RDF by Example: rdfpuml for True RDF Diagrams, rdf2rml for R2RML Generation


Languages

Language:Perl 78.1%Language:Ruby 15.4%Language:Shell 3.1%Language:Makefile 3.1%Language:Batchfile 0.3%