mims-harvard / decagon

Graph convolutional neural network for multirelational link prediction

Home Page:http://snap.stanford.edu/decagon

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Can you supply the instructions about how to use real-world data to train model

Dinxin opened this issue · comments

I viewed the whole code and found that the code only use toy dummy data to train model. So I don't really understand how you use those data to train GCN model. Can you supply the code or instructions about how to use real-world data to train model?

  • It's also not clear how to get predictions from the trained model on new data/ a new pair of drugs. Do i put in SIDER codes? STITCH? other codes? in what format?

I am also confused about how to apply the model to the real data.

Please give instructions on how to apply the actual dataset in the code. It is very difficult to understand what the variables represent in the code for dummy data.

I am trying to apply the code to the real datasets. In the first step, I tried to check if I have the same parameters (number of proteins, drugs,...) for the network. The number of proteins as what has mentioned in the paper should be 19085. But, from the protein-protein network(bio-decagon-ppi), I get 19081 proteins. Has anyone tried applying the code to the real dataset? and have you got the same number of proteins for the network? Thanks.

commented

I am also confused about how to apply the model to the real data. Has anyone solved the problem? Thanks.

Same problem for me, not quite sure how to apply that.

commented

@vidarmehr I also get 19081 proteins from the protein-protein network(bio-decagon-ppi), and 1317 side effects, not the same as mentioned in paper (1318). Is it the same with your parameters (number of proteins, drugs,...) ? Thanks.

Any updates? Same issue here. We want to reproduce the paper's results.

@chao1224 I was not able to reproduce the results of paper and I decided to stop working on Decagon for now.

@vidarmehr I also get 19081 proteins from the protein-protein network(bio-decagon-ppi), and 1317 side effects, not the same as mentioned in paper (1318). Is it the same with your parameters (number of proteins, drugs,...) ? Thanks.
Sorry for my delay. I just saw your comment. As I mentioned, I am not working on Decagon anymore. Here is data that I got from the paper and from the real datasets:
Number of proteins = 19,085 (paper) ....... Number of proteins = 19,081(ppi data)
Number of drugs = 645 (paper).......... Number of drugs = 645 (polypharmacy side effect data (combo))
Number of protien-protien edges= 715,612(paper) ....... Number of protien-protien edges= 715,612 (ppi data)
Number of drug-drug edges= 4,651,131 (paper) ......... Number of drug-drug edges= 4,649,441 (polypharmacy side effect data (combo))
Number of drug-protein edges= 18,596 (paper) ........ Number of drug-protein edges= 18,690 (Drug-target protein (targets))

commented

@vidarmehr I got it. Thank you so much for your reply.

Thanks for the reply @vidarmehr.

Just want to quickly clarify a number:

  1. In bio-decagon-targets.csv, there are 18,690 interactions.
  2. In bio-decagon-targets-all.csv, there are 131,034 interactions, and 112,438 of them are invalid (not included in the STITCH list or Gene list). Therefore, there are 131,034 -112,438 = 18,596 valid interactions.
commented

Are there any updates on this issue? I was also unable to reproduce the results in the paper. They say that they only focus on predicting the 964 polypharmacy side effects that each occurred in at least 500 drug combinations. However, the data they provide is the full TWOSIDES dataset. I don't know if they filter out some side effects in the code, but I couldn't find any evidence of this.

@rubjim I only can get 963 side effect types which appear in more than 500 drug combinations. I think the decagon dataset is so confusing that we could not apply it in our research work.

Was anyone ever able to reproduce the results? Or at least get it running properly?

commented

@Dinxin I agree with you, that's what I also get when I filter the side effects myself. However, they claim they predict for 964 which doesn't correspond to the actual numbers in the dataset. @christina-s-wang at least I wasn't able to do it.

NO one cares for these people asking some help? I am in the same spot.

to use this code with real data + python 3.6 try this fork:
https://github.com/DeepVivo/decagon