The microRNAs (miRNAs) play crucial roles in many biological processes involved in diseases and miRNAs function with protein coding genes (PCGs). In this study, we present a semi-supervised multi-label framework to integrate PCG-PCG interactions, PCG-miRNA interactions, PCG-disease associations using graph convolutional network (GCN). DimiG is then trained on a graph, which is further used to score associations between diseases and miRNAs.
The DimiG requires only a standard computer with enough RAM to support the operations. For minimal performance, this will be a computer with about 8 GB of RAM. For optimal performance, we recommend a computer with the following specs:
- RAM: 8+ GB
- CPU: Intel® Core™ i5-3337U CPU @ 1.80GHz × 4
This package is supported for Linux operating systems. The package has been tested on the following systems:
Linux: Ubuntu 16.04
- sklearn
- GCN
- PyTorch 0.4 or 0.5
- Python 2.7
Here we modified the orginal GCN (https://github.com/tkipf/pygcn) to support multi-label learning.
python setup.py install
-
To run the demo code, some big file needs be downloaded from other website:
- PCG-PCG interaction file "9606.protein.links.v10.txt.gz" can be downloaded from STRING v10 database.
- Disease-PCG assications file "human_disease_integrated_full.tsv" can be downloaded from DISEASES database. We also upload the file human_disease_integrated_full.zip in this repository, please decompress it at directory data/.
- PCG-miRNA interaction file "9606.v1.combined.tsv.gz" can be downloaded from RAIN v1.0 database.
- The above three files need be saved at dir "data/".
- PCG-PCG interaction file "9606.protein.links.v10.txt.gz" can be downloaded from STRING v10 database.
-
You can directly get the prediction and give the ROC curve by running:
python DimiG.py
It will output training and validaiton loss. In total, It takes < 20 minutes to run.