AstraZeneca / chemicalx

A PyTorch and TorchDrug based deep learning library for drug pair scoring. (KDD 2022)

Home Page:https://chemicalx.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add the DeepDDI model

benedekrozemberczki opened this issue · comments

Dear @hzcheney,

  • Please read the paper first. It is here.
  • After that read the contributing guidelines.
  • If there is an existing open source version of the model please take a look.
  • ChemicalX is built on top of PyTorch 1.10. and torchdrug.
  • A similar model is which uses to generate drug representations. Take a look at the layer definition here.
  • The library heavily builds on top on torchdrug and molecules in batches are PackedGraphs.
  • There is already a model class under ./chemicalx/models/
  • Context features, drug level features and labels are all FloatTensors.
  • Look at the examples and tests under ./examples/ and ./tests/.
  • Add auxiliary layers as you see fit - please document these, add tests and add these layers to the main readme.md if needed.
  • Add typing to the initialisation and forward pass.
  • Non data dependent hyperparameters should have default values.
  • Please add tests under ./tests/ and make sure that your model/layer is tested with real data.
  • Write an example under ./examples/. What is the AUC on the test set? Is it reasonable?
commented

Hi!😊Is this repo welcome for contribution?

Hi @hzcheney,

We are architecting the data loaders in January 2022 and after that, we will have a board with outstanding features and issues. I will get back to you!

Thank you for your interest! We want to hit KDD 2022 Applied Track.

Benedek

commented

That will be great! Good luck on your paper!

Hi @hzcheney ,

Are you interested in contributing?

commented

@benedekrozemberczki Yeah, I will try.

@YuWVandy what do you think?

commented

Hi! @benedekrozemberczki Sorry about the late response, I have already finished the model part. There is a problem with the input feature named SSP(structural similarity profile), it consists of the drug similarity vector which is based on their fingerprint. The problem is I can't find a straightforward way to calculate the SSP, any idea?

It is the following:

  1. For each drug a fingerprint is generated D X n. Where D is the number of drugs and n is the fingerprint dimensionality.
  2. Using the fingerprints you define a D X D similarity matrix.
  3. Using this Matrix you use PCA to reduce the dimensionality of the similarity matrix.
  4. This would require on my side that we add a key to the dataset which we could use to retrieve the SSP vectors.

I would say using the drug feature vectors is sufficient to develop this.

I would say don’t consider the drug featurization as a part of the model. Whether you use maacs, Morgan, or SSP shouldn’t make a difference

So you could just submit the PR to take in whatever drug features are available from the data loader (currently Morgan fingerprints) and in future work we could add different featurizations to the data loader.

Completely agree with @cthoyt about this. It should not be on the model side.

@hzcheney Are you going to open a PR with your code?

commented

@hzcheney Are you going to open a PR with your code?

@benedekrozemberczki Yeah! I have already opened a PR and please review it!