dice-group/RDF-Triplestores-Evaluation

When is the Peak Performance Reached? An Analysis of RDF Triple Stores

We performed an extensive experiments to show the query processing capabilities of well-known triple stores by using their SPARQL endpoints. In particular, we stress these triple stores with multiple parallel requests from different querying agents. Our experiments revealed the maximum query processing capabilities of these triple stores after that it may leads to denial of service (DoS) attacks. We hope this analysis will help triple stores developers to design workload-aware RDF engines to improve the availability of their public SPARQL endpoints, by avoiding the DOS attacks.

Persistent URI, Licence:

All of the data and results presented in our evaluation are available online from https://github.com/dice-group/RDF-Triplestores-Evaluation under Apache License 2.0 .

Datasets and Queries used:

Dataset	RDF Dump	Queries
DBpedia-3.5.1	Download	Download queries generated by FEASIBLE
WatDiv-10M	Download	Download
WatDiv-100M	Download	Download
WatDiv-1Billion	Download	Download

Triple Stores used:

Triplestore	Download	Related info
Virtuoso	here	Set the `virtuoso.ini` file accrodingly. However, the file we used in our experiments is given here.
Fuseki-TDB	here	Download and unzipp both apache-jena-fuseki-3.13.1 and apache-jena-3.13.1. Follow this tutorial for further guidence.
GraphDB	Docker Hub	`sudo docker run -p 127.0.0.1:7200:7200 -v /path/to/dataset/files:/path/to/dataset/files --name <contianer_name> -e "GDB_JAVA_OPTS= -Dgraphdb.workbench.importDirectory=/path/to/dataset/files" ontotext/graphdb:9.0.0-free`
Blazegraph	here	After unzip, run the `BlazegraphStah.sh` script, as given here.
Parliament	here	After unzip run `./StartParliament.sh`, given here to start the server and then in new terminal run `java -cp "clientJars/*" com.bbn.parliament.jena.joseki.client.RemoteInserter <hostname> <port> <inputfile> [<graph-name>]` to upload dataset.

Benchmark execution:

We used IGUANA, a benchmark execution framework, which can be downloaded from here. We set the iguana.config file (as given here), for all the individual experiments according to below given guidlines:

connection1.service=http://localhost:8895/sparql set the SPARQL endpoint address.
connection1.update.service=http://localhost:8895/sparql (optional) used for update/write operations. In our case only read operations (queries) are used, but not write. Therefore it is disabled (commented).
sparqlConfig1=<variable>, org.aksw.iguana.tp.tasks.impl.stresstest.worker.impl.SPARQLWorker, 600000, <path/to/queries.txt file>, 0, 0

Here <variable> shows the No. of workers, i.e., 1, 2, 4, 8, 16, 32, 64, 128 and 600000 milli second = 10 minutes is query timeout,
The stresstestArg.timeLimit=3600000 (in milli seconds, 1 hour, time to complete one experiment).
All the experiments are read based, therefore the update component of IGUANA is disabled (commented).

Once the config file is ready, then start experiment by following below steps:

Inside the parent IGUANA folder, start IGUANA by ./start-iguana.sh. Just after it has started successfully, open a new terminal and start the experiment by sending the iguana.config file to the iguana's core processor. This can be done by running the command ./send-config.sh iguana.config.
Expect the results in the same folder as results_*.nt file. Extract Queries-per-Second (QpS) out of the file through a command grep queriesPerSecond <result_file> > new_file. This new_file contains only QpS values in RDF format (object of the triple shows QpS value).
Upload this new_file to Virtuoso (in our case), run SELECT query to get all the QpS values and save it as CSV file. Add these values and take the average. This is the QpS value (throughput) of the triplestore for given number of parallel users.
Repeat the experimets for different number of users, and then for other triplestores in the same manner. NOTE: Ensure the availability of endpint before starting IGUANA - Also, a bash file all_exp_script shown here can be edited with little effort to perform all the experiments in one go.

Queries and Datasets statistics:

Diversity scores across different SPARQL query features of the selected benchmarks and percentage of SPARQL clauses and join vertex types of the queries we used, can be downloaded from here, as explained here

Evaluation results:

All the results obtained were in .nt format, but we converted them to CSV, as discussed above. The overall results along with plots can be downloaded from here while individual benchmark results can be seen from below links:

DBpedia3.5.1-FEASIBLE results

WatDiv-100-Million triples results

WatDiv-1-Billion triples results

WatDiv-10-Million triples results

Publication:

Our findings and methodologies are detailed in the following publication:

Khan, H., Ali, M., Ngonga Ngomo, A.-C., & Saleem, M. (2021). When is the Peak Performance Reached? An Analysis of RDF Triple Stores. Semantics-2021, Link to the paper

Citing:

@inproceedings{hashim2021peak,
  title={When is the Peak Performance Reached? An Analysis of RDF Triple Stores},
  author={Hashim Khan, Manzoor Ali and Ngomo, Axel-Cyrille Ngonga and Saleem, Muhammad},
  booktitle={Further with Knowledge Graphs: Proceedings of the 17th International Conference on Semantic Systems, 6-9 September 2021, Amsterdam, The Netherlands},
  volume={53},
  pages={154},
  year={2021},
  organization={IOS Press}
}

Authors:

Hashim Khan (DICE, Paderborn University)
Manzoor Ali (DICE, Paderborn University)
Axel-Cyrille Ngonga Ngomo (DICE, Paderborn University)
Muhammad Saleem (AKSW, University of Leipzig)

dice-group / RDF-Triplestores-Evaluation