cvignac / DiGress

code for the paper "DiGress: Discrete Denoising diffusion for graph generation"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Discrepancy for MOSES dataset evaluation protocol

qiyan98 opened this issue · comments

commented

Hi,

I notice the number of molecules to generate for evaluation on MOSES dataset is 25000, as specified in the config file.

final_model_samples_to_generate: 25000

The number of molecues are also 25000 in your shared SMILES samples: https://github.com/cvignac/DiGress/blob/main/generated_samples/generated_smiles_moses.txt.

However, the original MOSES paper suggests using 30000 generated samples for evaluation.
Snapshot:
Screenshot 2023-08-05 at 6 30 51 PM
Source: https://arxiv.org/pdf/1811.12823.pdf#page=3

I'm new to this dataset and feel confused about the discrepancy. Can you explain why we choose 25000 instead of 30000?

Thanks,
Qi

If you check the code of MOSES, I think that internally it uses 20000 valid samples to compute metrics. Since we can get enough valid molecules by sampling 25k molecules, we did not sample more.

Got it. Thanks!