Can you supply the instructions about how to use real-world data to train model

Question

Can you supply the instructions about how to use real-world data to train model

Dinxin opened this issue 6 years ago · comments

I viewed the whole code and found that the code only use toy dummy data to train model. So I don't really understand how you use those data to train GCN model. Can you supply the code or instructions about how to use real-world data to train model?

Dan Ofer · Answer 1 · Tue Aug 21 2018 16:47:14 GMT+0800 (China Standard Time)

It's also not clear how to get predictions from the trained model on new data/ a new pair of drugs. Do i put in SIDER codes? STITCH? other codes? in what format?

Colin Wu · Answer 2 · Fri Nov 02 2018 16:09:07 GMT+0800 (China Standard Time)

I am also confused about how to apply the model to the real data.

Mannu Sanghi · Answer 3 · Mon Nov 05 2018 07:13:47 GMT+0800 (China Standard Time)

Please give instructions on how to apply the actual dataset in the code. It is very difficult to understand what the variables represent in the code for dummy data.

Vida Ravanmehr · Answer 4 · Thu Nov 22 2018 04:16:58 GMT+0800 (China Standard Time)

I am trying to apply the code to the real datasets. In the first step, I tried to check if I have the same parameters (number of proteins, drugs,...) for the network. The number of proteins as what has mentioned in the paper should be 19085. But, from the protein-protein network(bio-decagon-ppi), I get 19081 proteins. Has anyone tried applying the code to the real dataset? and have you got the same number of proteins for the network? Thanks.

bbjy · Answer 5 · Tue Feb 26 2019 11:12:49 GMT+0800 (China Standard Time)

I am also confused about how to apply the model to the real data. Has anyone solved the problem? Thanks.

West Zhican Chen · Answer 6 · Wed Mar 13 2019 02:20:50 GMT+0800 (China Standard Time)

Same problem for me, not quite sure how to apply that.

bbjy · Answer 7 · Wed Apr 03 2019 22:05:44 GMT+0800 (China Standard Time)

@vidarmehr I also get 19081 proteins from the protein-protein network(bio-decagon-ppi), and 1317 side effects, not the same as mentioned in paper (1318). Is it the same with your parameters (number of proteins, drugs,...) ? Thanks.

Shengchao Liu · Answer 8 · Mon Jul 01 2019 04:57:46 GMT+0800 (China Standard Time)

Any updates? Same issue here. We want to reproduce the paper's results.

Vida Ravanmehr · Answer 9 · Mon Jul 01 2019 23:31:31 GMT+0800 (China Standard Time)

@chao1224 I was not able to reproduce the results of paper and I decided to stop working on Decagon for now.

Vida Ravanmehr · Answer 10 · Tue Jul 02 2019 03:12:00 GMT+0800 (China Standard Time)

@vidarmehr I also get 19081 proteins from the protein-protein network(bio-decagon-ppi), and 1317 side effects, not the same as mentioned in paper (1318). Is it the same with your parameters (number of proteins, drugs,...) ? Thanks.
Sorry for my delay. I just saw your comment. As I mentioned, I am not working on Decagon anymore. Here is data that I got from the paper and from the real datasets:
Number of proteins = 19,085 (paper) ....... Number of proteins = 19,081(ppi data)
Number of drugs = 645 (paper).......... Number of drugs = 645 (polypharmacy side effect data (combo))
Number of protien-protien edges= 715,612(paper) ....... Number of protien-protien edges= 715,612 (ppi data)
Number of drug-drug edges= 4,651,131 (paper) ......... Number of drug-drug edges= 4,649,441 (polypharmacy side effect data (combo))
Number of drug-protein edges= 18,596 (paper) ........ Number of drug-protein edges= 18,690 (Drug-target protein (targets))

bbjy · Answer 11 · Tue Jul 02 2019 09:13:44 GMT+0800 (China Standard Time)

@vidarmehr I got it. Thank you so much for your reply.

Shengchao Liu · Answer 12 · Tue Jul 02 2019 12:51:54 GMT+0800 (China Standard Time)

Thanks for the reply @vidarmehr.

Just want to quickly clarify a number:

In bio-decagon-targets.csv, there are 18,690 interactions.
In bio-decagon-targets-all.csv, there are 131,034 interactions, and 112,438 of them are invalid (not included in the STITCH list or Gene list). Therefore, there are 131,034 -112,438 = 18,596 valid interactions.

Ruben · Answer 13 · Tue Nov 12 2019 21:56:39 GMT+0800 (China Standard Time)

Are there any updates on this issue? I was also unable to reproduce the results in the paper. They say that they only focus on predicting the 964 polypharmacy side effects that each occurred in at least 500 drug combinations. However, the data they provide is the full TWOSIDES dataset. I don't know if they filter out some side effects in the code, but I couldn't find any evidence of this.

Dinxin · Answer 14 · Tue Nov 26 2019 18:05:08 GMT+0800 (China Standard Time)

@rubjim I only can get 963 side effect types which appear in more than 500 drug combinations. I think the decagon dataset is so confusing that we could not apply it in our research work.

Christina · Answer 15 · Sun Dec 13 2020 03:18:35 GMT+0800 (China Standard Time)

Was anyone ever able to reproduce the results? Or at least get it running properly?

Ruben · Answer 16 · Sun Dec 13 2020 20:06:08 GMT+0800 (China Standard Time)

@Dinxin I agree with you, that's what I also get when I filter the side effects myself. However, they claim they predict for 964 which doesn't correspond to the actual numbers in the dataset. @christina-s-wang at least I wasn't able to do it.

maryamag85 · Answer 17 · Fri Sep 03 2021 07:01:23 GMT+0800 (China Standard Time)

NO one cares for these people asking some help? I am in the same spot.

avi-pomicell · Answer 18 · Sun Dec 18 2022 16:08:58 GMT+0800 (China Standard Time)

to use this code with real data + python 3.6 try this fork:
https://github.com/DeepVivo/decagon