wengong-jin / hgraph2graph

Hierarchical Generation of Molecular Graphs using Structural Motifs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Generation example not working

cristianregep opened this issue · comments

I downloaded the package and ran from the generation folder the suggested process :
python get_vocab.py --min_frequency 100 --ncpu 8 < ../data/polymers/all.txt > ../data/polymers/vocab.txt
python preprocess.py --train ../data/polymers/train.txt --vocab data/polymers/vocab.txt --ncpu 8

I get the following error:
"""
Traceback (most recent call last):
File "/home/cristian/anaconda3/envs/hgraph/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/home/cristian/anaconda3/envs/hgraph/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "preprocess.py", line 19, in tensorize
x = MolGraph.tensorize(mol_batch, vocab, common_atom_vocab)
File "/home/cristian/Work/hgraph2graph/generation/poly_hgraph/mol_graph.py", line 168, in tensorize
tree_tensors, tree_batchG = MolGraph.tensorize_graph([x.mol_tree for x in mol_batch], vocab)
File "/home/cristian/Work/hgraph2graph/generation/poly_hgraph/mol_graph.py", line 209, in tensorize_graph
fnode[v] = vocab[attr]
File "/home/cristian/Work/hgraph2graph/generation/poly_hgraph/vocab.py", line 43, in getitem
return self.hmap[x[0]], self.vmap[x]
KeyError: ('C1=CSC=N1', 'N1=[CH:2]S[CH:2]=[CH:1]1')
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "preprocess.py", line 49, in
all_data = pool.map(func, batches)
File "/home/cristian/anaconda3/envs/hgraph/lib/python3.6/multiprocessing/pool.py", line 266, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/home/cristian/anaconda3/envs/hgraph/lib/python3.6/multiprocessing/pool.py", line 644, in get
raise self._value
KeyError: ('C1=CSC=N1', 'N1=[CH:2]S[CH:2]=[CH:1]1')

I traced the issue to be the fact that you load the motifs from the vocab in preprocess.py, instead of loading the original motifs that pass the min_frequency mark in get_vocab.py
MolGraph.load_fragments([x[0] for x in vocab])

I got rid of the behaviour by saving the original fragments in a separate file after get_vocab.py and then loading them in preprocess.py. What I think is happening is that molecules are not split in the same way because of the difference of starting fragments.

Hi,

I fixed this issue and now it should be able to run. Thank you!

Hi, when I tried to run the generation example, a similar error occurs as below.
Could you check this error? @wengong-jin

code:
python preprocess.py --train ../data/polymers/train.txt --vocab ../data/polymers/inter_vocab.txt --ncpu 8

error:
Traceback (most recent call last): File "preprocess.py", line 48, in <module> all_data = pool.map(func, batches) File "/st2/hayeon/anaconda3/envs/metasamp/lib/python3.6/multiprocessing/pool.py", line 266, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/st2/hayeon/anaconda3/envs/metasamp/lib/python3.6/multiprocessing/pool.py", line 644, in get raise self._value KeyError: ('CN1C(=O)C2=C3C(=C(F)C=C4C(=O)N(C)C(=O)C(=C43)C(F)=C2)C1=O', 'CN1C(=O)C2=CC(F)=C3C(=O)N(C)C(=O)C4=C3C2=C(C1=O)C(F)=[CH:1]4')

Hi,

I tried running the same command and there was no error. I think what you can do is to run get_vocab.py and see if the output is different from data/polymers/inter_vocab.txt. If they are different (I would be surprised), please try rerun preprocess.py and see if it succeeds.

Hi, I had the same trouble. I found the problem to be that the string being called to map from vocab is different from the ones available. In my case there was difference in the SMILES represntation of the double bond C(O) and C(=O)
Screenshot (827)
Could you tell how the problem can be resolved?