graph-database api big-data natural-language-processing bioinformatics data-mining

EpiGraphDB: A database and data mining platform for health data science

EpiGraphDB is an analytical platform and database to support data mining in epidemiology. The platform incorporates a graph of causal estimates generated by systematically applying Mendelian randomization to a wide array of phenotypes, and augments this with a wealth of additional data from other bioinformatic sources. EpiGraphDB aims to support appropriate application and interpretation of causal inference in systematic automated analyses of many phenotypes.

This repository contains example use cases to demonstrate the functionalities of EpiGraphDB, via the API using Jupyter notebooks in Python. The following table lists the main components of EpiGraphDB and their sources.

EpiGraphDB Component	Source
Data integration pipeline	Github
Web UI	Github
API	Github
R package	Github
Example use cases (this repo)	Github (this repo)

*We will gradually open source our components in the future months.

Example use cases

Below are the example use case notebooks. You can either use Google Colab or Binder to play with these notebooks online, or clone the repo and set up a local Jupyter lab environment.

Case studies for the EpiGraphDB paper
Case study 1: Distinguishing vertical and horizontal pleiotropy for SNP-protein associations
Case study 2: Identification of potential drug targets
Case study 3: Triangulating causal estimates with literature evidence
General examples
Getting started with EpiGraphDB in Python
Functionalities in getting metadata, entity search, and using Cypher queires

The example notebooks above are done in Python using the EpiGraphDB API. R users can visit the package vignettes for equivalent functionalities using the epigraphdb R package.

Set up

We provide a conda environment configuration file for readers to run the notebooks locally.

To do this first you will need to install conda. Then follow the steps below to set up the conda environment.

# Bootstrap the environment
conda env create -f environment.yml

# Activate the environment in your shell session.
conda activate epigraphdb-notebooks

# Open Jupyter lab and you should be able to run the code examples!
jupyter lab

Citation

Please cite EpiGraphDB as

Yi Liu, Benjamin Elsworth, Pau Erola, Valeriia Haberland, Gibran Hemani, Matt Lyon, Jie Zheng, Oliver Lloyd, Marina Vabistsevits, Tom R Gaunt, EpiGraphDB: a database and data mining platform for health data science, Bioinformatics, btaa961, https://doi.org/10.1093/bioinformatics/btaa961

@article{epigraphdb2020bioinformatics,
    author = {Liu, Yi and Elsworth, Benjamin and Erola, Pau and Haberland, Valeriia and Hemani, Gibran and Lyon, Matt and Zheng, Jie and Lloyd, Oliver and Vabistsevits, Marina and Gaunt, Tom R},
    title = {{EpiGraphDB}: a database and data mining platform for health data science},
    journal = {Bioinformatics},
    year = {2020},
    month = {11},
    issn = {1367-4803},
    doi = {10.1093/bioinformatics/btaa961},
    url = {https://doi.org/10.1093/bioinformatics/btaa961},
    note = {btaa961},
    eprint = {https://academic.oup.com/bioinformatics/advance-article-pdf/doi/10.1093/bioinformatics/btaa961/34178613/btaa961.pdf}
}

Contact

Please get in touch with us for issues, comments, suggestions, etc. via the following methods:

About

Examples on using EpiGraphDB

http://epigraphdb.org

graph-database api big-data natural-language-processing bioinformatics data-mining

Languages

Language:Jupyter Notebook 99.7%Language:Makefile 0.3%