whats the correct number of semi template patterns?

Question

whats the correct number of semi template patterns?

otori-bird opened this issue 4 years ago · comments

i run the following command several times and got different results:
python extract_semi_template_pattern.py --extract pattern

The number of semi template patterns could be 650, 653, 655, 656. it may end up with 657 at the end but i gave up.

In the previous version of code i got 646, exactly the same as the previous "train.py" and it seemed there were no problem during the whole project except that the top-1 acc was a few percents lower than the paper's.

my python package version are:
python 3.6.12
torch 1.2.0
opennmt-py 1.0.0
rdkit 2019.03.4.0
dgl-cu102 0.4.2

Allen · Answer 1 · Thu Jan 07 2021 00:01:45 GMT+0800 (China Standard Time)

I do not think it matters. You can use what you got.

otori-bird · Answer 2 · Thu Jan 07 2021 14:35:40 GMT+0800 (China Standard Time)

I do not think it matters. You can use what you got.

ok. I got it. Thanks a lot for your prompt reply!

pyxiea · Answer 3 · Fri Jan 08 2021 09:43:40 GMT+0800 (China Standard Time)

i run the following command several times and got different results:
python extract_semi_template_pattern.py --extract pattern

The number of semi template patterns could be 650, 653, 655, 656. it may end up with 657 at the end but i gave up.

In the previous version of code i got 646, exactly the same as the previous "train.py" and it seemed there were no problem during the whole project except that the top-1 acc was a few percents lower than the paper's.

my python package version are:
python 3.6.12
torch 1.2.0
opennmt-py 1.0.0
rdkit 2019.03.4.0
dgl-cu102 0.4.2

Can you reproduce the results reported in the paper now? I trained the EGAT model and evaluate it following the README, but I got 81.2% accuracy.

(nips) xiepengyu@c03:~/RetroXpert$ CUDA_VISIBLE_DEVICES=1 python train.py --typed --test_only  --load
Namespace(batch_size=32, dataset='USPTO50K', epochs=80, exp_name='USPTO50K_typed', gat_layers=3, heads=4, hidden_dim=128, in_dim=704, load=True, logdir='logs', lr=0.0005, seed=123, test_on_train=False, test_only=True, typed=True, use_cpu=False, valid_only=False)
Counter({1: 3486, 0: 1411, 2: 109, 11: 1})
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [00:59<00:00,  1.48s/it]
pred_true_list size: 5007
Bond disconnection number prediction acc: 0.992211
Loss:  4.209371869616911
Bond disconnection acc (without auxiliary task): 0.812463

I don't know which score in the paper should I compare this result to (maybe 86.0?). It seems like the first model is not trained well, so I haven't trained the second model.