Seed and random results for embeddings

Question

Seed and random results for embeddings

sebkaz opened this issue a year ago · comments

Hi!

I want to ask You about seed parameters for most node embeddings.
In the documentation, You have info that you put seed=42 as a default, but when You run, for example, Node2Vec twice, you get different embedding vectors.

Do you plan to make some changes so that if you have seed as default, there will also be workers=1?

best regards
S.

Chris Tomlinson · Answer 1 · Thu Aug 17 2023 18:55:39 GMT+0800 (China Standard Time)

@sebkaz I've also noticed this issue (different embedding vectors per iteration of the same algorithm/params/seed).

I think it's a hard one to solve at the karateclub level, across all algorithms, given reliance on other packages under the hood.

E.g. NetMF uses sklearn's TruncatedSVD which defaults to a randomised solver and seems to acknowledge this issue in the documentation:

SVD suffers from a problem called “sign indeterminacy”, which means the sign of the components_ and the output from transform depend on the algorithm and random state. To work around this, fit instances of this class to data once, then keep the instance around to do transformations.

It would seem to me that any workarounds (e.g. setting workers=1, using other solvers) would lead to an increased compute time and on balance isn't worth it? E.g. if you have a specific use case where you need it to be reproducible then the user can address that on a case-by-case basis?