AnacletoLAB / grape

🍇 GRAPE is a Rust/Python Graph Representation Learning library for Predictions and Evaluations

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

missing `get_minimum_spanning_tree` and similar

rtbs-dev opened this issue · comments

Very impressive library so far. Just wanted to mention here, unless I'm misreading your API docs, that the Graph object doesn't have an implementation of Prim's or Kruskal's minimum/maximum spanning tree. This is the last thing keeping me on e.g. scipy.sparse.csgraph, and was the first thing I looked for here.

Ideally, I would imagine a slightly more useful MST interface that e.g. defaults to the spanning tree for the whole graph, but could accept an array of node activation flags and an (optional) cost matrix to calculate the MST on that induced subgraph. This is part of a simple way to approximate the steiner tree on those nodes, for instance. If the user doesn't supply a cost matrix, then the metric closure would work (again, if desired...MST on the original graph weights is probably the default).

I did find these, but a number expressly say the tree is not minimal:

  • spanning_arborescence
  • spanning_arborescence_kruskal
  • random_spanning_arborescence_kruskal (very nice! Wilson's algorithm? Is this a uniform-random sample over unweighted trees? Is the MST the mode for weighted edges, like it normally would be?

What do you mean by but a number expressly say the tree is not minimal?

I mean in the docs. From the second one:

Returns consistent spanning arborescence using Kruskal.
The spanning tree is NOT minimal.

From the third:

The spanning tree is NOT minimal. The given random_state is NOT the root of the tree.

And the first seems to look like the second, and never specifies if the arborescence is minimal over the provided edge weights.

All of these methods will return you the arborescences, which of course have a minimal number of edges. I don't recall whether I implemented one for the weighted case, as I don't have ever needed one. Do you know any good algorithm that scales well?

Sure; for starters, scipy implements minimum_spanning_tree (in fact, everything in the scipy.sparse.csgraph module would be a great thing to include here!)

The source code there has a reference implementation of Kruskal's algorithm (in a weighted setting).

The other option (outside of networkX's many implementations, one of which is Boruvka's algorithm) is graphblas, which would be very fast if done on the matrix, directly, but I can only find a version of Prim's algorithm in a C++ template repo...nothing for python-graphblas.

Note that these are all essentially O(|E|log|V|), so they are considered quite fast already. I think there's an expected-linear-time one, as well, e.g. here. But that would probably be more work than it's worth.

My use-case is typically finding MSTs in a metric closure, so Prim's algorithm runs faster (on dens graphs).