--min-heavies-per-const-frag 3 option looses some transformations

Question

--min-heavies-per-const-frag 3 option looses some transformations

ValeryPolyakov opened this issue 5 years ago · comments

When using --min-heavies-per-const-frag 3 option during the fragmentation stage, I noticed that I am loosing the following transformation:
[:1]O[:2] to [:1]C([:2])N
in which one of the Rs is a simple methyl group. Is it possible to somehow loosing this transformation by playing with any option on during the indexing step.

Christian Kramer · Answer 1 · Tue Mar 12 2019 16:17:38 GMT+0800 (China Standard Time)

Hi Valery,

if you use --min-heavies-per-const-frag 3, the
[:1]O[:2] >> [:1]C([:2])N
transformation will not be indexed any more, if one of the rests is just a simple methyl, because that fragmentation will not be created any more. So there is no way to get back exactly this transformation during indexing.

However, for all pairs of molecules which previously had that double-cut transformation, the single-cut transformation
[:1]OC >> [:1]C(C)N
will still be indexed and written to the database/output.

The idea for the --min-heavies-per-frag option was that there can be a lot of use cases where having one of the two (or even more in other cases) transformations in the DB is sufficient.

Do you have a specific need to use the double-cut transformation rather than the single-cut transformation?

Bests,
Christian

Valery R Polyakov · Answer 2 · Tue Mar 12 2019 19:55:59 GMT+0800 (China Standard Time)

Thanks Christian, Do you know why I did not see a single-cut transformation in the output? Is there a specific option during indexing to recover it? Valery

…

On Tue, Mar 12, 2019 at 1:18 AM Christian Kramer ***@***.***> wrote: Hi Valery, if you use --min-heavies-per-const-frag 3, the [*:1]O[*:2] >> [*:1]C([*:2])N transformation will not be indexed any more, if one of the rests is just a simple methyl, because that fragmentation will not be created any more. So there is no way to get back exactly this transformation during indexing. However, for all pairs of molecules which previously had that double-cut transformation, the single-cut transformation [*:1]OC >> [*:1]C(C)N will still be indexed and written to the database/output. The idea for the --min-heavies-per-frag option was that there can be a lot of use cases where having one of the two (or even more in other cases) transformations in the DB is sufficient. Do you have a specific need to use the double-cut transformation rather than the single-cut transformation? Bests, Christian — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#9 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ApV2_mi23IIbK2WG4SZ14wpZWrL1zxRZks5vV2LQgaJpZM4bpQL2> .

Christian Kramer · Answer 3 · Tue Mar 12 2019 20:28:20 GMT+0800 (China Standard Time)

Hi Valery,

I do not think that there is a specific option about to remove/ recover this transformation during indexing. If you send me two example input SMILES that have the problem, I can try to figure out what is going on here.

Bests,
Christian

Valery R Polyakov · Answer 4 · Tue Mar 12 2019 20:38:06 GMT+0800 (China Standard Time)

Thanks. I need to think about it. Obviously, I cannot send the actual compound...

…

On Tue, Mar 12, 2019 at 5:28 AM Christian Kramer ***@***.***> wrote: Hi Valery, I do not think that there is a specific option about to remove/ recover this transformation during indexing. If you send me two example input SMILES that have the problem, I can try to figure out what is going on here. Bests, Christian — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#9 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ApV2_rE4Mw-RbeWJX7C9fqsAx7KTvmpAks5vV51lgaJpZM4bpQL2> .

Valery R Polyakov · Answer 5 · Wed Mar 13 2019 01:59:15 GMT+0800 (China Standard Time)

Hi Christian, I which database table should I find the single and double cut smiles like this: [*:1]O[*:2] or [*:1]OC?* Valery Valery On Tue, Mar 12, 2019 at 5:44 AM Valery Polyakov <valery.polyakov@gmail.com> wrote:

…

Thanks. I need to think about it. Obviously, I cannot send the actual compound... On Tue, Mar 12, 2019 at 5:28 AM Christian Kramer ***@***.***> wrote: > Hi Valery, > > I do not think that there is a specific option about to remove/ recover > this transformation during indexing. If you send me two example input > SMILES that have the problem, I can try to figure out what is going on here. > > Bests, > Christian > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub > <#9 (comment)>, or mute > the thread > <https://github.com/notifications/unsubscribe-auth/ApV2_rE4Mw-RbeWJX7C9fqsAx7KTvmpAks5vV51lgaJpZM4bpQL2> > . >

Christian Kramer · Answer 6 · Wed Mar 13 2019 18:14:55 GMT+0800 (China Standard Time)

Hi Valery,

that depends on what you want to do exactly. The database has a table named 'rule_smiles' which contains all the SMILES of the fragments. In SQLite3, you can check whether a given SMILES is in that table by

"select * from rule_smiles where smiles like '[*:1]OC';"

If you want to find all transformations where that SMILES is involved, you have to query the table named 'rule' with the ID you get from the first query. If you are interested in all transformations + environments where that SMILES is used, you have to query the 'rule_environment' table with the id you get from the query in the 'rule' table.

If you query within the DB directly, I recommend to build the DB using the --symmetric option. Otherwise, you have to use the ID for your smiles in both LHS and RHS columns.

Bests,
Christian

Valery R Polyakov · Answer 7 · Thu Mar 14 2019 00:46:40 GMT+0800 (China Standard Time)

Hi Christian, Thanks. The query works. It is a little slow, though. Is the --symmertic option result is DB size increase? Valery

…

On Wed, Mar 13, 2019 at 3:14 AM Christian Kramer ***@***.***> wrote: Hi Valery, that depends on what you want to do exactly. The database has a table named 'rule_smiles' which contains all the SMILES of the fragments. In SQLite3, you can check whether a given SMILES is in that table by "select * from rule_smiles where smiles like '[*:1]OC';" If you want to find all transformations where that SMILES is involved, you have to query the table named 'rule' with the ID you get from the first query. If you are interested in all transformations + environments where that SMILES is used, you have to query the 'rule_environment' table with the id you get from the query in the 'rule' table. If you query within the DB directly, I recommend to build the DB using the --symmetric option. Otherwise, you have to use the ID for your smiles in both LHS and RHS columns. Bests, Christian — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#9 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ApV2_tQ8VZebR-F3D_ew9kKuU5tiJ8Rrks5vWM-ggaJpZM4bpQL2> .

Christian Kramer · Answer 8 · Thu Mar 14 2019 17:44:04 GMT+0800 (China Standard Time)

Hi Valery,

yes, it results in an almost 2 fold increase of the DB size.

Christian

Valery R Polyakov · Answer 9 · Tue Mar 19 2019 01:24:03 GMT+0800 (China Standard Time)

Hi Christian, Can you tell me what SQL commands go into the following query: python mmpdb predict --smiles "smiles1" --reference "smiles2" --property "propName" --save-details --prefix noOptions master_full.mmpdb > noOptions.txt obviously, there are real smiles under smiles1 and smiles2. Thanks a lot, Valery

…

On Thu, Mar 14, 2019 at 2:44 AM Christian Kramer ***@***.***> wrote: Hi Valery, yes, it results in an almost 2 fold increase of the DB size. Christian — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#9 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ApV2_i_lRmUnq79V4nfFwJT0aXJMnzEWks5vWhnlgaJpZM4bpQL2> .

Andrew Dalke · Answer 10 · Tue Mar 19 2019 16:45:14 GMT+0800 (China Standard Time)

In mmpdblib/schema.py change the method MMPDatabase.execute from

        if 0:
            import time
            print("EXECUTE")
            print(sql)
            print(repr(args))

to

        if 1:
            import time
            print("EXECUTE")
            print(sql)
            print(repr(args))

That is, change the "0" to a "1". If I recall correctly, that will print out all of the SQL calls and their parameters.

The --symmetric flag roughly doubles the database size but reduces the number of required SQL queries.

Valery R Polyakov · Answer 11 · Tue Mar 19 2019 21:55:38 GMT+0800 (China Standard Time)

Thanks. I will try that.

…

On Tue, Mar 19, 2019, 1:45 AM Andrew Dalke ***@***.***> wrote: In mmpdblib/schema.py change the method MMPDatabase.execute from if 0: import time print("EXECUTE") print(sql) print(repr(args)) to if 1: import time print("EXECUTE") print(sql) print(repr(args)) That is, change the "0" to a "1". If I recall correctly, that will print out all of the SQL calls and their parameters. The --symmetric flag roughly doubles the database size but reduces the number of required SQL queries. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#9 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ApV2_iV1fKzS1mKmcVzAXHRkEUTZbuHkks5vYKObgaJpZM4bpQL2> .

Valery R Polyakov · Answer 12 · Thu Mar 21 2019 01:25:50 GMT+0800 (China Standard Time)

Hi Andrew, I did that, but the sql statements are not being printed... Valery R. Polyakov

…

On Tue, Mar 19, 2019 at 1:45 AM Andrew Dalke ***@***.***> wrote: In mmpdblib/schema.py change the method MMPDatabase.execute from if 0: import time print("EXECUTE") print(sql) print(repr(args)) to if 1: import time print("EXECUTE") print(sql) print(repr(args)) That is, change the "0" to a "1". If I recall correctly, that will print out all of the SQL calls and their parameters. The --symmetric flag roughly doubles the database size but reduces the number of required SQL queries. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#9 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ApV2_iV1fKzS1mKmcVzAXHRkEUTZbuHkks5vYKObgaJpZM4bpQL2> .

Andrew Dalke · Answer 13 · Thu Mar 21 2019 18:09:11 GMT+0800 (China Standard Time)

I am surprised at that. As far as I can tell, the analysis routines all use something like self.mmpa_db.execute, which goes through the method I asked you to modify.

I don't have the time to figure out where those specific calls are being done.

Are you sure that you've modified the right code? That is, sometimes it's hard to figure out if Python is using a modified file vs. the installed package file.

All I can suggest is that you trace through the code to find out where those SQL calls are being done, and add the print statements in the correct places. This can be done with the debugger (including the graphical IDE "IDLE" which comes as part of the distribution), among other methods.

Christian Kramer · Answer 14 · Mon May 20 2019 14:28:10 GMT+0800 (China Standard Time)

Hi Valery,

is this issue still open? If yes, could you post an example that I can use to trace down the problem in the code?

Thank you,
Christian

Valery R Polyakov · Answer 15 · Mon May 20 2019 16:12:12 GMT+0800 (China Standard Time)

Christian and Andrew, I was able to print the statements. Thanks. By the way, is there any way for me to close the issue? Valery R. Polyakov

…

On Sun, May 19, 2019 at 11:28 PM Christian Kramer ***@***.***> wrote: Hi Valery, is this issue still open? If yes, could you post an example that I can use to trace down the problem in the code? Thank you, Christian — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#9?email_source=notifications&email_token=AKKXN7VJQZYX2BQMVMY2TP3PWJAHXA5CNFSM4G5FAL3KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVXZ7VQ#issuecomment-493854678>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AKKXN7QRMJEOQMO6VFHJ2O3PWJAHXANCNFSM4G5FAL3A> .

Christian Kramer · Answer 16 · Mon May 20 2019 19:00:56 GMT+0800 (China Standard Time)

Hi Valery,

I don't know whether you can close it. I will close this issue. If there are still questions open regarding this issue, we can open it again.

Best regards,
Christian

Valery R Polyakov · Answer 17 · Mon May 20 2019 19:05:50 GMT+0800 (China Standard Time)

Thanks

…

On Mon, May 20, 2019, 1:00 PM Christian Kramer ***@***.***> wrote: Hi Valery, I don't know whether you can close it. I will close this issue. If there are still questions open regarding this issue, we can open it again. Best regards, Christian — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#9?email_source=notifications&email_token=AKKXN7VZAF7BUDKVYCM66UTPWKAGTA5CNFSM4G5FAL3KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVYOUMA#issuecomment-493939248>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AKKXN7WYHM2WCCC5GWIJZOLPWKAGTANCNFSM4G5FAL3A> .