cadmiumkitty / dcaf-2020-provo

Demo code for my talk at Data-Centric Architecture Forum 2020 about data provenance and PROV ontology.

Home Page:https://www.slideshare.net/EugeneMorozov/data-provenance-and-prov-ontology

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Data Provenance and PROV-O Ontology talk at DCAF 2020

Introduction

The purpose of this demo is to show capturing of the provenance information using common vocabulary of PROV in a repo trading and risk reporting scenario. It is built using Event Sourcing and CQRS patterns on top of Kafka.

Set up

  1. Single Kafka node with single Zookeeper node
  2. Repo producer that creates and amends repo trades based on trade events
  3. Counterparty producer that creates and amends counterparty records
  4. Risk calculator that calculates risk figures based on repo events
  5. Provenance aggregator as a Kafka Connect node
  6. Simple Jena Fuseki triplestore to aggregate PROV data
  7. Prov-O-Viz set up for simple visualization

Running the demo

Build individual projects under repo (trade and counterparty events, risk calculator and event processor) and connect (Kafka Connect SPARQL sink for PROV) with mvn clean package.

Build and start containers with:

docker-compose up -d --build

Once containers are up and running, you can check that PROV triples are being created in Jena by going to http://localhost:3030/dataset.html?tab=query&ds=/dcaf and issuing simple SPARQL query such as:

SELECT *
WHERE {
  ?s ?p ?o
}

To view visualization go to http://localhost:5000/ and select endpoint http://fuseki:3030/dcaf/query telling PROV-O-Viz to Ignore Named Graphs.

About

Demo code for my talk at Data-Centric Architecture Forum 2020 about data provenance and PROV ontology.

https://www.slideshare.net/EugeneMorozov/data-provenance-and-prov-ontology


Languages

Language:Java 95.6%Language:Dockerfile 4.4%