benedekrozemberczki / karateclub

Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)

Home Page:https://karateclub.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Seed and random results for embeddings

sebkaz opened this issue · comments

commented

Hi!

I want to ask You about seed parameters for most node embeddings.
In the documentation, You have info that you put seed=42 as a default, but when You run, for example, Node2Vec twice, you get different embedding vectors.

Do you plan to make some changes so that if you have seed as default, there will also be workers=1?

best regards
S.

@sebkaz I've also noticed this issue (different embedding vectors per iteration of the same algorithm/params/seed).

I think it's a hard one to solve at the karateclub level, across all algorithms, given reliance on other packages under the hood.

E.g. NetMF uses sklearn's TruncatedSVD which defaults to a randomised solver and seems to acknowledge this issue in the documentation:

SVD suffers from a problem called “sign indeterminacy”, which means the sign of the components_ and the output from transform depend on the algorithm and random state. To work around this, fit instances of this class to data once, then keep the instance around to do transformations.

It would seem to me that any workarounds (e.g. setting workers=1, using other solvers) would lead to an increased compute time and on balance isn't worth it? E.g. if you have a specific use case where you need it to be reproducible then the user can address that on a case-by-case basis?