Towards Understanding the Geometry of Knowledge Graph Embeddings

This is the code for generating the results in the paper "Towards Understanding the Geometry of Knowledge Graph Embeddings" to be presented at the 56th Annual Meeting of the Association for Computational Linguistics at Melbourne, July 15 to July 20, 2018.

Required data format

The analysis requires pre-trained KG embeddings along with the KG triples data. The KG triples data should be a pickle (python2.7) file named "<dataset>.<method>.bin". It should contain the following key values:

'train_subs': list of KG triples used for training, in (head_entity_index, tail_entity_index, relation_index) format.
'valid_subs': list of KG triples used for validation, in (head_entity_index, tail_entity_index, relation_index) format.
'test_subs': list of KG triples used for testing, in (head_entity_index, tail_entity_index, relation_index) format.
'relations': list of KG relations.
'entities': list of KG entities.

The KG embeddings should be stored as pickle (python2.7) file named "<dataset>.<method>.n<no-of-negatives>.d<dimension>.p". It should contain following key values:

'rNames' : list of KG relations.
'eNames' : list of KG entities.
'E' : numpy array of size (numEntities X dimension) containing entity embeddings.
'R' : numpy array of size (numRelations X dimension) containing relation embeddings.
'model' : model name.
'fpos test' : ranks of head and tail entities obtained during link prediction. It is required for performance analysis. It should be a dictionary with relation index as keys, e.g. {rel_id1 :{'head':[head_rank_1, head_rank_2, ...], 'tail':[tail_rank_1, tail_rank_2, ...]}}.

Running type analysis

For running type analysis (Section 5.1 in the paper), please run the following command:

python typeAnalysis.py -m <data-directory> -d <dataset-name> -g <conicity/length> --opdir <output-directory> --type <ent/rel>
python typeAnalysis.py -m <data-directory> -d <dataset-name> -g <conicity/length> --opdir <output-directory> --type <ent/rel> --result (for generating the plots)

Running negative analysis

For running negative analysis (Section 5.2 in the paper), please run the following command:

python negativeAnalysis.py -m <data-directory> -d <dataset-name> -g <conicity/length> --opdir <output-directory> --type <ent/rel>
python negativeAnalysis.py -m <data-directory> -d <dataset-name> -g <conicity/length> --opdir <output-directory> --type <ent/rel> --result (for generating the plots)

Running dimension analysis

For running dimension analysis (Section 5.3 in the paper), please run the following command:

python dimensionAnalysis.py -m <data-directory> -d <dataset-name> -g <conicity/length> --opdir <output-directory> --type <ent/rel>
python dimensionAnalysis.py -m <data-directory> -d <dataset-name> -g <conicity/length> --opdir <output-directory> --type <ent/rel> --result (for generating the plots)

Running performance analysis

For running performance analysis (Section 5.4 in the paper), please run the following command:

python perfAnalysis.py -m <data-directory> -d <dataset-name> -g <conicity/length> --opdir <output-directory> --type <ent/rel> -p <performance-file>
python perfAnalysis.py -m <data-directory> -d <dataset-name> -g <conicity/length> --opdir <output-directory> --type <ent/rel> --result -p <performance-file> (for generating the plots)

Here the <performance-file> is a pickled file containing performance of different models. It is a nested dictionary and perf['<method>'][<dimension>][<numNegatives>] should contain performance {'MRR':<MRR-value>, 'MR':<MR-value>, 'Hits@10':<Hits@10-value>} for <method> with vector size <dimension> and <numNegatives> number of negative samples.

Citation

If you find our work or this codebase useful, please cite us:

@inproceedings{chandrahas-etal-2018-towards,
  title = "Towards Understanding the Geometry of Knowledge Graph Embeddings",
  author = "{Chandrahas}  and
    Sharma, Aditya  and
    Talukdar, Partha",
  booktitle = "Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
  month = jul,
  year = "2018",
  address = "Melbourne, Australia",
  publisher = "Association for Computational Linguistics",
  url = "https://www.aclweb.org/anthology/P18-1012",
  pages = "122--131",
  }

malllabiisc / kg-geometry