RDKit port
OliverBScott opened this issue · comments
Hi there, thanks for sharing this code and also the code for shape-it, it is really great to see people supporting the open-source community! I have been meaning to try to learn/use rdkit in c++ for a while and this gave me an excuse. My attempt can be found here https://github.com/OliverBScott/align-it , and also has Python wrappers (pyalignit). I am sure there are still a few bugs so I would be grateful for any contributions!
When working on this I also noticed that the ported code produces different pharmacophores for molecules with and without hydrogens. With the original code, this doesn't happen (hydrogen agnostic). I think the issue is due to a couple of functions that have been changed incorrectly. For example in hDonFuncCalc.cpp the original code (ob2) at line 43 is:
if (a->GetFormalCharge() >= 0 && ((a->GetImplicitValence() - a->GetHvyValence()) !=0))
This line seems to be checking whether there are any hydrogen atoms connected. The port replaces GetImplicitValence()
with GetImplicitHCount()
. While this makes sense logically it seems that the GetImplicitValence()
function does not do what you would expect, from the documentation:
Returns: The implicit valence of this atom type (i.e. maximum number of connections expected)
while GetImplicitHCount()
seems to do what you would expect
, from the documentation:
Returns: The number of non-hydrogens connected to this atom
i.e. if the expected number of connections of a nitrogen atom is 3 and its heavy degree is 2 then we expect it to have 1 hydrogen atom.
I think that to get the equivalent behaviour with openbabel3 one could use the following to get the number of hydrogens while remaining agnostic to whether hydrogens are explicitly included:
a->GetImplicitHCount() + a->ExplicitHydrogenCount()
and thus the line becomes:
if (a->GetFormalCharge() >= 0 && ((a->GetImplicitHCount() + a->ExplicitHydrogenCount()) !=0))
I think that openbabel2 had very confusing function names regarding valence and H counting!
There are a few more cases of this particular function in the files:
- hAccFuncCalc.cpp
- lipoFuncCalc.cpp
I have attempted to fix these here but it is possible that there may still be issues.
Thanks again!
Oliver
Hi @OliverBScott
Thanks for your issue!
It makes sense. I'll check code.
Best,
@iwatobipen