rdkit / mmpdb

A package to identify matched molecular pairs and use them to predict property changes.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Obtain list of matched pairs with common core from an ID.

isohelio opened this issue · comments

Hi, I've built my MMP database for a set of compounds but am struggling to generate the output I would like.

My use case is a pretty typical one, finding changes that lead to large property change in compounds obtained from patents.

c1ccccc1O X1
c1ccccc1OC X2
c1ccccc1N X3
c1ccccc1OC1CC1 X4
c1cc(Cl)ccc1O X5

What I'm hoping to do is generate a list of matched pairs for a given processed compound. e.g. X1

c1ccccc1* X1 *O X2 OC
c1ccccc1
X1 *O X3 N
c1ccccc1
X1 *O X4 *OC1CC1
etc.

Is this possible?

Thanks
Mike

Hi Mike,

this is not a standard use case that we implemented, so you need a workaround. You can generate a list of all matched pairs if you index to a .csv file rather than a sqlite database. As a next step, you'd then need to match the activities with the pairs in the .csv file and filter to teh pairs that you are interested in - this should be rather straightforward with for example pandas.

Bests,
Christian

Hi Christian,

Thanks for the pointer, I've managed to pull together the values I need from the csv output.

This would seem to be a useful feature for the main application? Its pretty much the first thing I do with matched pairs, just list the relationships within a set of compounds.

Thanks again,
Mike

Is your organization interesting in funding that development, with the results contributed back to mmpdb?

Hi Andrew,

Not for functionality like this I'm afraid.

The CSV file output contains the exact information needed, which will work for small datasets.
Seems a pretty useful addition to be able to recreate that for a list of provided ids direct from the sqlite database.

Thanks,
Mike