CWTSLeiden / networkanalysis

Java package that provides data structures and algorithms for network analysis.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

java.lang.ArrayIndexOutOfBoundsException when using -q Modularity

opened this issue · comments

The following command
java -cp ~/Bioinformatics/networkanalysis/build/libs/networkanalysis-1.1.0-5-ga3f342d.jar nl.cwts.networkanalysis.run.RunNetworkClustering -q CPM -m 1 -w -o /tmp/edges_clustering.txt edges.txt
runs just fine. However, if the -q option is set to "Modularity", I get the following crash:

RunNetworkClustering version 1.1.0
By Vincent Traag, Ludo Waltman, and Nees Jan van Eck
Centre for Science and Technology Studies (CWTS), Leiden University
Reading edge list from 'edges.txt'.
Reading edge list took 0s.
Network consists of 18386 nodes and 423926 edges with a total edge weight of 24546.50103795767.
Using singleton initial clustering.
Running Leiden algorithm.
Quality function: Modularity
Resolution parameter: 1.0
Minimum cluster size: 1
Number of random starts: 1
Number of iterations: 10
Randomness parameter: 0.01
Random number generator seed: random
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Index 5 out of bounds for length 5
at nl.cwts.networkanalysis.LocalMergingAlgorithm.findClustering(LocalMergingAlgorithm.java:190)
at nl.cwts.networkanalysis.LeidenAlgorithm.improveClusteringOneIteration(LeidenAlgorithm.java:228)
at nl.cwts.networkanalysis.LeidenAlgorithm.improveClusteringOneIteration(LeidenAlgorithm.java:276)
at nl.cwts.networkanalysis.IterativeCPMClusteringAlgorithm.improveClustering(IterativeCPMClusteringAlgorithm.java:91)
at nl.cwts.networkanalysis.run.RunNetworkClustering.main(RunNetworkClustering.java:413)

I've attached the input file here
edges.txt
I'm not sure what property of this input file would violate the preconditions of the software. The nodes are 0-indexed numbers, there are no duplicate edges, the lefthand node ID is always less than the righthand node ID, etc.

The problem seems to be caused by the edges with zero weights. For the time being you can circumvent the problem by simply removing those edges. This can be done without any problem since edges with zero weights have no effect anyway.

Nonetheless, this is a bug that should be corrected. We will provide a fix at some later time.

In particular, the root cause of the problem is that we check whether a neighboring cluster is already added in these conditionals:

We should probably use a boolean array isClusterAdded instead of relying on the edge weight. This is the most robust way to address this issue I think. @neesjanvaneck, what do you think?

Ah, I see. Makes perfect sense. I didn't intend to have any edges with zero weight. I'm happy to check for that condition and remove those edges, which are meaningless. Would you like me to close this issue (as far as I'm concerned, it's already resolved)?

No problem, I imagine that you did not check this condition prior to running the algorithm. No, let's leave the issue open, as the program shouldn't crash on this input.

I suppose a smaller change would simply be to filter out edges with zero weight in the code that reads the edges from the file. This way, you can enforce your own precondition, and leave the downstream code unchanged. The advantage is that it would avoid adding another variable for book-keeping.

Yes, we would probably do that as well. However, the program still should not crash, even if an edge of zero weight would somehow be included.