dgarijo / kgtk-delta-wikidata

This repository contains the code for calculating deltas from two sorted kgtk files. The effort is targeted towards capturing the differences between different Wikidata endpoints

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

kgtk-delta-wikidata

This repository contains the code for calculating deltas from two sorted kgtk files. The effort is targeted towards capturing the differences between different Wikidata endpoints

To use the script just type:

python calculate_deltas.py -o path_to_old_dataset -n path_to_new_dataset -d output_directory

Where path_to_old_dataset is the path to the oldest dataset taking part in the comparison; path_to_new_dataset is the most recent dataset, and output_directory is the directory where the outputs will be written to. Three outputs will be produced: added.tsv, which contains the statements that have been newly added in the more recent dataset; deleted.tsv wich contains the statements that have been deleted in the most recent dataset; and modified_qual.tsv, which contains those statements which have qualifiers that have been modified.

For all files, only id, node1, label, node2 is kept for simplicity.

About

This repository contains the code for calculating deltas from two sorted kgtk files. The effort is targeted towards capturing the differences between different Wikidata endpoints

License:Apache License 2.0


Languages

Language:Python 100.0%