rdkit / mmpdb

A package to identify matched molecular pairs and use them to predict property changes.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

--min-heavies-per-const-frag 3 option looses some transformations

ValeryPolyakov opened this issue · comments

When using --min-heavies-per-const-frag 3 option during the fragmentation stage, I noticed that I am loosing the following transformation:
[:1]O[:2] to [:1]C([:2])N
in which one of the Rs is a simple methyl group. Is it possible to somehow loosing this transformation by playing with any option on during the indexing step.

Hi Valery,

if you use --min-heavies-per-const-frag 3, the
[:1]O[:2] >> [:1]C([:2])N
transformation will not be indexed any more, if one of the rests is just a simple methyl, because that fragmentation will not be created any more. So there is no way to get back exactly this transformation during indexing.

However, for all pairs of molecules which previously had that double-cut transformation, the single-cut transformation
[:1]OC >> [:1]C(C)N
will still be indexed and written to the database/output.

The idea for the --min-heavies-per-frag option was that there can be a lot of use cases where having one of the two (or even more in other cases) transformations in the DB is sufficient.

Do you have a specific need to use the double-cut transformation rather than the single-cut transformation?

Bests,
Christian

Hi Valery,

I do not think that there is a specific option about to remove/ recover this transformation during indexing. If you send me two example input SMILES that have the problem, I can try to figure out what is going on here.

Bests,
Christian

Hi Valery,

that depends on what you want to do exactly. The database has a table named 'rule_smiles' which contains all the SMILES of the fragments. In SQLite3, you can check whether a given SMILES is in that table by

"select * from rule_smiles where smiles like '[*:1]OC';"

If you want to find all transformations where that SMILES is involved, you have to query the table named 'rule' with the ID you get from the first query. If you are interested in all transformations + environments where that SMILES is used, you have to query the 'rule_environment' table with the id you get from the query in the 'rule' table.

If you query within the DB directly, I recommend to build the DB using the --symmetric option. Otherwise, you have to use the ID for your smiles in both LHS and RHS columns.

Bests,
Christian

Hi Valery,

yes, it results in an almost 2 fold increase of the DB size.

Christian

In mmpdblib/schema.py change the method MMPDatabase.execute from

        if 0:
            import time
            print("EXECUTE")
            print(sql)
            print(repr(args))

to

        if 1:
            import time
            print("EXECUTE")
            print(sql)
            print(repr(args))

That is, change the "0" to a "1". If I recall correctly, that will print out all of the SQL calls and their parameters.

The --symmetric flag roughly doubles the database size but reduces the number of required SQL queries.

I am surprised at that. As far as I can tell, the analysis routines all use something like self.mmpa_db.execute, which goes through the method I asked you to modify.

I don't have the time to figure out where those specific calls are being done.

Are you sure that you've modified the right code? That is, sometimes it's hard to figure out if Python is using a modified file vs. the installed package file.

All I can suggest is that you trace through the code to find out where those SQL calls are being done, and add the print statements in the correct places. This can be done with the debugger (including the graphical IDE "IDLE" which comes as part of the distribution), among other methods.

Hi Valery,

is this issue still open? If yes, could you post an example that I can use to trace down the problem in the code?

Thank you,
Christian

Hi Valery,

I don't know whether you can close it. I will close this issue. If there are still questions open regarding this issue, we can open it again.

Best regards,
Christian