Issue with reproducing the results: variance in accuracy

Question

Issue with reproducing the results: variance in accuracy

samiracs87 opened this issue 4 years ago · comments

Hi there,
I tried to run your code and model, but I have not been able to get the results which is shown in the simple_example_tensorflow.ipynb using the same seed value used there.
Does the seed, set in idx_split_args, fix the data set split for training, early stopping and validation?Fixing the same seed value in idx_split_args, is there any other components which might cause the variance in accuracy (apart from the randomized components in training such as dropout and weight initialization)?

Thanks!

Johannes Gasteiger, né Klicpera · Answer 1 · Thu Apr 09 2020 15:06:57 GMT+0800 (China Standard Time)

As with any GNN I've seen the performance of PPNP has a rather large variance. Hence it is very unlikely that you will get the exact same results reported in simple_example_tensorflow.ipynb.

Have a look at reproduce_results.ipynb instead, where we run the model 100x with varying seeds to get statistically significant results. Any proper model evaluation should do this.

Yes, the seed fixes all parts of the data split, but only that. It does not affect model initialization. This repository is provided so you can check any hypothesis you make. Just look through the code. :)

samiracs87 · Answer 2 · Fri Apr 10 2020 01:31:42 GMT+0800 (China Standard Time)

Thanks for the helpful response! Sure, I'll look into the code and will do more experiments.
BTW, nice work! :)