Knowledge Graph Embeddings in the Biomedical Domain: Are They Useful? A Look at Link Prediction, Rule Learning, and Downstream Polypharmacy Tasks

This repository contains code for training and finetuning for "Knowledge Graph Embeddings in the Biomedical Domain: Are They Useful? A Look at Link Prediction, Rule Learning, and Downstream Polypharmacy Tasks"

Authors: Aryo Pradipta Gema, Dominik Grabarzcyk, Wolf De Wulf, Piyush Borole, Dr. Javier Alfaro, Dr. Pasquale Minervini, Dr. Antonio Vergari, Dr. Ajitha Rajan

1. Installation

Create an anaconda environment using the environment.yaml file:

conda env create -f environment.yml

Activate the environment:

conda activate kge

Clone and install libkge:

git clone git@github.com:uma-pi1/kge.git
cd libkge
pip install -e .

To deactivate the environment:

conda deactivate

2. Knowledge Graphs

The used knowledge graphs are those from the v1.0.0 release of BIOKG:

BIOKG
BIOKG benchmarks:
- ddi_efficacy
- ddi_minerals
- dpi_fda
- dep_fda_exp

Download them using the download.py script:

python scripts/data/download.py --help

The set seed ensures that they are the same as the ones used in our evaluations. We can also provide them upon request.

The libkge dataset format is used. Once downloaded, dataset folders need to be moved to kge/data.

3. Experimental Evaluations

Link prediction

All configuration files for the link prediction evaluations mentioned in the article can be found in the configs/link_prediction folder.
Please read through the libkge documentation to find out how to use them.
To be able to run the evaluations where models are initialised with pretrained embeddings, make sure to download the models folder from the supplementary material.

Warning: The HPO runs can take up to a week to finish and some of the generated configurations might require a high-end GPU to be able to run at all. During research, these HPO runs were ran on HPC clusters.

Relation Classification

All configuration files for the relation classification evaluations mentioned in the article can be found in the configs/relation_classification folder.
To reproduce our results, use the relation_classification.py script in combination with one of the config files:

python scripts/benchmarks/relation_classification.py --help

4. Questions

Feel free to contact any of the authors via email if you have questions.

aryopg / biokge