Notes

In the GeoNames dataset you can find informations such as Area in sq km, the countries ISO codes, phone code, currency...
In WorldBank dataset: yearly GDP value in $US of all countries (from 1980 to 2018). 2 files are available for WorldBank:
- dataset-worldbank-gdp.xml contains a the yearly GDP of a few dozens of countries: RML should be pretty fast to process it
- dataset-worldbank-gdp-full.xml contains the full datasets (all countries): RML can take more than 30min to process
- We advise you to use the dataset-worldbank-gdp.xml dataset when testing. And when you find the right configuration you can run it for the full XML file with all countries.
Useful links
- Ontologies
  - GeoNames ontology
  - DBpedia ontology
- SPARQL specifications
- Find the URL for a prefix: http://prefix.cc

Download the required files

The easiest way to download the repository is to clone it using git:

git clone https://github.com/MaastrichtU-IDS/UM_KEN4256_KnowledgeGraphs.git
cd UM_KEN4256_KnowledgeGraphs/

You can also download it as a .zip file.

Execute RML mapping files

You can test if you have Java installed by opening the the terminal (or PowerShell on Windows) and typing:

java -version

If Java is not installed, you can install the version 8 from the Java website.

Download RML processor rmlmapper.jar and put it in the UM_KEN4256_KnowledgeGraphs folder to execute the example mapping file:

java -jar rmlmapper.jar -m "mapping.ttl" -o "output.nt" --duplicates

This command should be executed in the directory where the rmlmapper.jar file and RDF files are located (this repository).
--duplicates allow to remove duplicates triples from the output file.
The example mapping.ttl file is available to help you start converting the first columns.

Running the rmlmapper on the full DrugBank dataset can take about 40min. Let us know if your computer can't make it.

Install GraphDB

Download and install

Download GraphDB (register to receive an email with the download links)
Install from exe, dmg, deb or rpm depending on your operating system.
Access it on http://localhost:7200/

Create a repository (triplestore)

Setup > Repositories > Create new repository
- Enter the repository ID you want (only mandatory field here)
- Create
- Try out the other parameters (the Context index is recommended if you use multiple graphs)

Users

Enabling security and user management is not necessary when using GraphDB in local. Contact us if you have issues with it.

Explore

GraphDB offers multiple various modules that can be useful to visualize or process data, such as the class hierarchy visualization or OntoRefine.

Interlink datasets with LIMES

Download the jar file for LIMES release 1.7.1.

An example of LIMES config file is provided in the repository, see limes_config.xml

java -jar limes-core-1.7.1.jar limes_config.xml

See the official LIMES documentation for more details on its options, such as the available metrics and thresholds.

Or try out the LIMES Web UI: http://limes.aksw.org/

Using other tools (optional)

Conversion can be done using various other tools and methods. You are encouraged to use different tools than RML mapper and LIMES if they fit the task. Here are some examples of other tools to convert structured data to RDF, they usually needs a bit more proficiency with programming and deploying services on your machine than RML, but are more scalable and can process gigabytes of data.

Data2Services

Students using Linux or MacOS and who already used Docker can use the d2s client, a scalable tool to convert input datasets to a target RDF knowledge graph. It uses SPARQL queries to map the input data to the target ontology instead of RML mappings. See the documentation.

pip install d2s cwlref-runner
d2s init

Client in Python 3, using docker-compose to run services and CWL to run workflows.

RMLStreamer

A new tool for RML processing, it aims to be a scalable implementations of RML. RMLStreamer process stream of data to RDF.

It will require you to start Apache Flink to stream the data (using Docker).

R2RML

You could also use R2RML. The RDB (Relational Database) to RDF Mapping Language is a precursor of RML, it allows you to define mappings for SQL databases (RML extends it for other files, such as XML or JSON). R2RML has much more fast and scalable implementations, but doesn't handle XML (you would need to convert the XML to a CSV or a RDB). R2RML doesn't support CSV natively but CSV files can be exposed as a relational database (each file being a table) using Apache Drill.

See this repository for easy deployment of Apache Drill using Docker. Start it on your /data/r2rml directory:
docker run -dit --rm --name drill -v /data/r2rml:/data:ro -p 8047:8047 -p 31010:31010 umids/apache-drill:latest

OntoRefine

Developed from OpenRefine, OntoRefine is specialized in converting and processing data to RDF. It is included in your GraphDB installation. It allows you to load data from CSV or XML, and apply some processing before converting it to RDF. See this tutorial for more informations.

Python scripts

A common way to process data is still to pick your favorite scripting language and use it to process the data. It usually offers more possibilities and libraries can be helpful, but the mappings are not expressed clearly in a mapping language, making them harder to read, share and reuse.

Explore a graph using SPARQL

Be aware that the count operations can be really time consuming (depending on the dataset size), so you might want to remove it if the query is timing out.

Count all classes in the graph

select ?Concept (count(?Concept) as ?Count) # Count the number of ?Concept in the "group by"
where {?s a ?Concept} # We take all the URIs that are types of other URIs
group by ?Concept # Uniq concepts
order by desc(?Count) # Order from the most used class to the less

Get all properties for a Class

select ?Predicate (count(?Predicate) as ?Count) 
where {
	?s a <http://geonames.org/Country> .
	?s ?Predicate ?o .
} 
group by ?Predicate
order by desc(?Count)

Get all instances of a Class

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?instance ?label
where {
    ?instance a <http://geonames.org/Country> .
    OPTIONAL { ?instance rdfs:label ?label . } # Display the label if one
}

priya-gitTest / UM_KEN4256_KnowledgeGraphs