EBjerrum / SMILES-enumeration

SMILES enumeration for QSAR modelling using LSTM recurrent neural networks

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to use it for small Smiles like Aspirin Smile: CC(=O)Oc1ccccc1C(=O)O

Fatima-Aslam opened this issue · comments

Whenever, I write any small Smiles string
Then I get
[22:10:14] Can't kekulize mol. Unkekulized atoms: 2 3 4 5 6
Traceback (most recent call last):
File "SmilesEnumerator.py", line 272, in
from SmilesEnumerator import SmilesEnumerator
File "D:\MSCS (fatima)\medicinal plants\Research task (August)\Synopsis\SmilesEnumerationCode - Copy\SMILES-enumeration-master (1)\SMILES-enumeration-master\SmilesEnumerator.py", line 276, in
print(sme.randomize_smiles("COc1cnc(nC1N(C)C)c2ccccc2"))
File "D:\MSCS (fatima)\medicinal plants\Research task (August)\Synopsis\SmilesEnumerationCode - Copy\SMILES-enumeration-master (1)\SMILES-enumeration-master\SmilesEnumerator.py", line 170, in randomize_smiles
ans = list(range(m.GetNumAtoms()))
AttributeError: 'NoneType' object has no attribute 'GetNumAtoms'

Please help me to resolve this issue

I don't think your SMILES are getting parsed by RDKit. All smiles must be parsable by RDKit

Chem.MolFromSmiles("COc1cnc(nC1N(C)C)c2ccccc2")
RDKit ERROR: [13:54:15] Can't kekulize mol. Unkekulized atoms: 2 3 4 5 6
RDKit ERROR:

In this particular instance I think you need to tell RDKit on which aromatic nitrogen the hydrogen is situated.

Data augmentation with SMILES enumeration is all about generating alternative SMILES for the same molecule.