benedekrozemberczki / karateclub

karateclub/karateclub/community_detection/overlapping/bigclam.py

Lines 111 to 115 in de27e87

    
           for node in nodes: 
        
               nebs = [neb for neb in graph.neighbors(node)] 
        
               neb_features = self._embedding[nebs, :] 
        
               node_feature = self._embedding[node, :] 
        
               gradient = self._calculate_gradient(node_feature, neb_features)

I've noticed that the gradient calculation in BigCLAM could be easily parallelized. This should be embarrassingly parallel, doing batches no n_jobs gradient calculation. The only issue would be when some node in the batch is also the neighbor of another node in the same batch, since self._do_updates would have to wait for the entire batch to run and then update the duplicated node embeddings.

This event may be rare if we consider that the graph is sparse and it is unlikely that such "collision" would happen, still, if it did happen, i believe it would not be a huge problem, as long as it does not happen in further iterations (which is even more unlikely)

What do you guys think? I could implement this if you think it'd be safe to incur in the issue i've mentioned above.

Do you want to open a PR?

Sure, just wanted to check first if the possible caveats of paralelisation would teoreticslly hurt the final results

	for node in nodes:
	nebs = [neb for neb in graph.neighbors(node)]
	neb_features = self._embedding[nebs, :]
	node_feature = self._embedding[node, :]
	gradient = self._calculate_gradient(node_feature, neb_features)

Parallel BigCLAM Gradient Computation