Generation example not working
cristianregep opened this issue · comments
I downloaded the package and ran from the generation folder the suggested process :
python get_vocab.py --min_frequency 100 --ncpu 8 < ../data/polymers/all.txt > ../data/polymers/vocab.txt
python preprocess.py --train ../data/polymers/train.txt --vocab data/polymers/vocab.txt --ncpu 8
I get the following error:
"""
Traceback (most recent call last):
File "/home/cristian/anaconda3/envs/hgraph/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/home/cristian/anaconda3/envs/hgraph/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "preprocess.py", line 19, in tensorize
x = MolGraph.tensorize(mol_batch, vocab, common_atom_vocab)
File "/home/cristian/Work/hgraph2graph/generation/poly_hgraph/mol_graph.py", line 168, in tensorize
tree_tensors, tree_batchG = MolGraph.tensorize_graph([x.mol_tree for x in mol_batch], vocab)
File "/home/cristian/Work/hgraph2graph/generation/poly_hgraph/mol_graph.py", line 209, in tensorize_graph
fnode[v] = vocab[attr]
File "/home/cristian/Work/hgraph2graph/generation/poly_hgraph/vocab.py", line 43, in getitem
return self.hmap[x[0]], self.vmap[x]
KeyError: ('C1=CSC=N1', 'N1=[CH:2]S[CH:2]=[CH:1]1')
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "preprocess.py", line 49, in
all_data = pool.map(func, batches)
File "/home/cristian/anaconda3/envs/hgraph/lib/python3.6/multiprocessing/pool.py", line 266, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/home/cristian/anaconda3/envs/hgraph/lib/python3.6/multiprocessing/pool.py", line 644, in get
raise self._value
KeyError: ('C1=CSC=N1', 'N1=[CH:2]S[CH:2]=[CH:1]1')
I traced the issue to be the fact that you load the motifs from the vocab in preprocess.py, instead of loading the original motifs that pass the min_frequency mark in get_vocab.py
MolGraph.load_fragments([x[0] for x in vocab])
I got rid of the behaviour by saving the original fragments in a separate file after get_vocab.py and then loading them in preprocess.py. What I think is happening is that molecules are not split in the same way because of the difference of starting fragments.
Hi,
I fixed this issue and now it should be able to run. Thank you!
Hi, when I tried to run the generation example, a similar error occurs as below.
Could you check this error? @wengong-jin
code:
python preprocess.py --train ../data/polymers/train.txt --vocab ../data/polymers/inter_vocab.txt --ncpu 8
error:
Traceback (most recent call last): File "preprocess.py", line 48, in <module> all_data = pool.map(func, batches) File "/st2/hayeon/anaconda3/envs/metasamp/lib/python3.6/multiprocessing/pool.py", line 266, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/st2/hayeon/anaconda3/envs/metasamp/lib/python3.6/multiprocessing/pool.py", line 644, in get raise self._value KeyError: ('CN1C(=O)C2=C3C(=C(F)C=C4C(=O)N(C)C(=O)C(=C43)C(F)=C2)C1=O', 'CN1C(=O)C2=CC(F)=C3C(=O)N(C)C(=O)C4=C3C2=C(C1=O)C(F)=[CH:1]4')
Hi,
I tried running the same command and there was no error. I think what you can do is to run get_vocab.py and see if the output is different from data/polymers/inter_vocab.txt
. If they are different (I would be surprised), please try rerun preprocess.py and see if it succeeds.