webdata / ERI

Efficient RDF Interchange (ERI) Format for RDF Data Streams

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ERI

Efficient RDF Interchange (ERI) Format for RDF Data Streams

Motivation. RDF Streams

RDF streams are sequences of timestamped RDF statements or graphs, which can be generated by several types of data sources (sensors, social networks, etc.). They may provide data at high volumes and rates, and be consumed by applications that require real-time responses. Hence it is important to publish and interchange them efficiently.

What is ERI?

The Efficient RDF Interchange (ERI) format is a compressed serialization for RDF streams. ERI exploits a key feature of RDF data streams, which is the regularity of their structure and data values, proposing a compressed serialization which can reduce the amount of data transmitted when processing RDF streams. ERI achieves significant space savings w.r.t. standard data streaming compression, remaining efficient in performance.

More information

The ERI proposal is published in the Internation Semantic Web Conference 2014

Fernández, J. D., Llaves, A., & Corcho, O. (2014, October). Efficient RDF interchange (ERI) format for RDF data streams. In International Semantic Web Conference (pp. 244-259). Springer, Cham.

Authors

  • Javier D. Fernández, Vienna University of Economics and Business (Austria);
  • Alejandro Llaves, Fujitsu Laboratories of Europe (Spain);
  • Óscar Corcho, Ontology Engineering Group (OEG), Univ. Politécnica de Madrid (Spain);

ACK

The research leading to these results has received funding from the European Union’s Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 257641, PlanetData network of excellence.

Use the code

Import the code and use the available tools in /src/org/oegupm/compactstreaming/tools:

RDF2StreamFile <input RDF> <outputRDF_Comp>

Converts a given RDF input to ERI. Parameters:

 -rdftype : Type of RDF Input (ntriples, nquad, n3, turtle, rdfxml)
 -base : Base URI for the dataset
 -config : Config file for the conversion
 -prefixes : File including the URIs to be treated as prefixes during the conversion (one per line)
 -discrete : File inluding the URIs of the discrete predicates (one per line), i.e. those predicates followed by few different object values.
 -uniq : File inluding the URIs of the those predicates (one per line) whose objects are mostly unrepeated.
 -block : Number of Triples per Block
 -quiet : Do not show progress of the conversion
StreamFile2RDF <input RDF_Comp> <outputRDF>

Coverts ERI back to plain RDF (only ntriples supported). Parameters

 -quiet : Do not show progress of the conversion

Configuration

Configuration file

Please specify conversion parameters via a config file using the schema <property>=<value>

Example:

store_subject_dictionary=false
store_object_dictionary=false
disable_consistent_predicates=false
block_size=4096
  • store_subject_dictionary : Boolean to indicate if a LRU cache of subjects is used (improves compression if subjects are highly repeated)
  • store_object_dictionary : Boolean to indicate if a LRU cache of objects is used (improves compression if objects are highly repeated)
  • disable_consistent_predicates: By default, ERI assummes all literal values of a given predicate are of the same data type (float, string, dateTime, etc.). If this cannot be assumd, use this property (set it to false) to disable this feature.
  • block_size : Integer value indicating the number of triples per block.

About

Efficient RDF Interchange (ERI) Format for RDF Data Streams

License:GNU Lesser General Public License v3.0


Languages

Language:Java 100.0%