uma-pi1 / wikidata5m-si

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

A Benchmark for Semi-Inductive Link Prediction in Knowledge Graphs

This is the benchmark, code, and configuration accompanying the EMNLP-Findings 2023 paper A Benchmark for Semi-Inductive Link Prediction in Knowledge Graphs. The main branch holds code/information about the benchmark itself. The following branches hold code and configuration for the separate models evaluated in the study.

Benchmark

Download data

mkdir data
cd data
curl -O https://madata.bib.uni-mannheim.de/424/2/wikidata5m-si.tar.gz
tar -zxvf wikidata5m-si.tar.gz

Generate Few Shot Tasks

  • use the file prepare_few_shot.py
  • create a few_shot_set_creator object
    • dataset_name: (str) name of the dataset
      • default: wikidata5m_v3_semi_inductive
    • use_invese: (bool) whether to use inverse relations
      • default: False
      • if True: for all triples where the unseen entity is in the object slot, increase relation id by num-relations and invert triple
    • split: (str) which split to use
      • default: valid
    • context_selection: (str) which context_selection technique to use
      • default: most_common
      • options: most_common, least_common, random
few_shot_set_creator = FewShotSetCreator(
	dataset_name="wikidata5m_v3_semi_inductive",
	use_inverse=True,
	split="test"
)
  • generate the data using the few_shot_set_creator
    • num_shots: (int) the number of shots to use (between 0 and 10)
data = few_shot_set_creator.create_few_shot_dataset(num_shots=5)
  • evaluation is performed in direction unseen to seen
  • output format looks like this
[
{
	"unseen_entity": <id of unseen entity>,
	"unseen_slot": <slot of unseen entity: 0 for head/subject, 2 for tail/object>,
	"triple: <[s, p, o]>,
	"context: <[unseen_entity_id, unseen_entity_slot, s, p, o]>
},
...

]

Create Benchmarks Based on Other Graphs

  • to create similar benchmark based on other graphs use the file create_semi_inductive_dataset.py
  • this file was used to create wikidata5m-si based on wikidata5m

How to Cite

  • if you use the proposed benchmark, the provided code or insights presented in the paper please cite.
@inproceedings{kochsiek2023benchmark,                                                                                                                                                                  
title={A Benchmark for Semi-Inductive Link Prediction in Knowledge Graphs},
author={Kochsiek, Adrian and Gemulla, Rainer},
booktitle={Findings of the Association for Computational Linguistics: EMNLP 2023},
year={2023}
}

About


Languages

Language:Python 100.0%