missing `get_minimum_spanning_tree` and similar

Question

missing `get_minimum_spanning_tree` and similar

rtbs-dev opened this issue 8 months ago · comments

Very impressive library so far. Just wanted to mention here, unless I'm misreading your API docs, that the Graph object doesn't have an implementation of Prim's or Kruskal's minimum/maximum spanning tree. This is the last thing keeping me on e.g. scipy.sparse.csgraph, and was the first thing I looked for here.

Ideally, I would imagine a slightly more useful MST interface that e.g. defaults to the spanning tree for the whole graph, but could accept an array of node activation flags and an (optional) cost matrix to calculate the MST on that induced subgraph. This is part of a simple way to approximate the steiner tree on those nodes, for instance. If the user doesn't supply a cost matrix, then the metric closure would work (again, if desired...MST on the original graph weights is probably the default).

I did find these, but a number expressly say the tree is not minimal:

spanning_arborescence
spanning_arborescence_kruskal
random_spanning_arborescence_kruskal (very nice! Wilson's algorithm? Is this a uniform-random sample over unweighted trees? Is the MST the mode for weighted edges, like it normally would be?

Luca Cappelletti · Answer 1 · Tue Dec 05 2023 23:48:12 GMT+0800 (China Standard Time)

What do you mean by but a number expressly say the tree is not minimal?

Rachael Sexton · Answer 2 · Wed Dec 06 2023 11:43:31 GMT+0800 (China Standard Time)

I mean in the docs. From the second one:

Returns consistent spanning arborescence using Kruskal.
The spanning tree is NOT minimal.

From the third:

The spanning tree is NOT minimal. The given random_state is NOT the root of the tree.

And the first seems to look like the second, and never specifies if the arborescence is minimal over the provided edge weights.

Luca Cappelletti · Answer 3 · Wed Dec 06 2023 16:05:58 GMT+0800 (China Standard Time)

All of these methods will return you the arborescences, which of course have a minimal number of edges. I don't recall whether I implemented one for the weighted case, as I don't have ever needed one. Do you know any good algorithm that scales well?

Rachael Sexton · Answer 4 · Thu Dec 07 2023 04:24:09 GMT+0800 (China Standard Time)

Sure; for starters, scipy implements minimum_spanning_tree (in fact, everything in the scipy.sparse.csgraph module would be a great thing to include here!)

The source code there has a reference implementation of Kruskal's algorithm (in a weighted setting).

The other option (outside of networkX's many implementations, one of which is Boruvka's algorithm) is graphblas, which would be very fast if done on the matrix, directly, but I can only find a version of Prim's algorithm in a C++ template repo...nothing for python-graphblas.

Rachael Sexton · Answer 5 · Thu Dec 07 2023 04:59:31 GMT+0800 (China Standard Time)

Note that these are all essentially O(|E|log|V|), so they are considered quite fast already. I think there's an expected-linear-time one, as well, e.g. here. But that would probably be more work than it's worth.

My use-case is typically finding MSTs in a metric closure, so Prim's algorithm runs faster (on dens graphs).