cvignac / DiGress

code for the paper "DiGress: Discrete Denoising diffusion for graph generation"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MOSES node type distribution doesn't sum to one

asiraudin opened this issue · comments

Hello Clément,

When processing the MOSES dataset with MiDi compute_all_statistics code, I noticed that the node_type argument contains incorrect values. Here is what I get for the train dataset :
[0.72200687 0.1364436 0.10383305 0.01433876 0.01637907 0.00546271 0.00153594, 0.0]
while the hard coded values in Moses dataset are
[0.722338, 0.13661, 0.103549, 0.1421803, 0.163655, , 0.005411, 0.00150, 0.0]

On what split did you compute the marginal distributions ? It seems that some values differ by an order of magnitude, and the distribution doesn't sum to 1.

Best,
Antoine

Hello Antoine, these values were computed over the full dataset. The reason is that if an atom type appears in the test set but not in the training set, it will result in a NLL of +infinity because the probability of generating it will be 0. It is the same for the distribution of the number of nodes.