benedekrozemberczki / karateclub

Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)

Home Page:https://karateclub.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Parallel BigCLAM Gradient Computation

AlanGanem opened this issue · comments

for node in nodes:
nebs = [neb for neb in graph.neighbors(node)]
neb_features = self._embedding[nebs, :]
node_feature = self._embedding[node, :]
gradient = self._calculate_gradient(node_feature, neb_features)

I've noticed that the gradient calculation in BigCLAM could be easily parallelized. This should be embarrassingly parallel, doing batches no n_jobs gradient calculation. The only issue would be when some node in the batch is also the neighbor of another node in the same batch, since self._do_updates would have to wait for the entire batch to run and then update the duplicated node embeddings.

This event may be rare if we consider that the graph is sparse and it is unlikely that such "collision" would happen, still, if it did happen, i believe it would not be a huge problem, as long as it does not happen in further iterations (which is even more unlikely)

What do you guys think? I could implement this if you think it'd be safe to incur in the issue i've mentioned above.

Do you want to open a PR?

Sure, just wanted to check first if the possible caveats of paralelisation would teoreticslly hurt the final results