haskell / containers

Assorted concrete container types

Home Page:https://hackage.haskell.org/package/containers

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Data.Graph.bcc is not efficient

meooow25 opened this issue · comments

  • The collect function recursively concats on a tree, making the time complexity quadratic. This can be avoided using difference lists.
  • The do_label step builds some unnecessary intermediate trees, similar to the dfs we had (#882).

I'm not aware of common use cases for bcc, so I'm not sure if this affects anyone.
But as long we have it, we should make it efficient.

I can send a PR.

Everything should be efficient if it can be, yes! Thanks.

I can't make head or tail of the current algorithm. Is it explained in the paper? Please comment your version liberally; this doesn't seem likely to be obvious.

Agreed that the code is not descriptive at all. It is explained the paper though. I'll make sure to add comments to make things a bit clearer 👍

The new bcc code still uses forest twice, successively (i.e., not interleaved). This strikes me as rather bad. King and Launchbury pushed for lazy dfs; Tarjan's paper does ... something else. I think the question you raise is a good one: what can we do to help manage complexity without realizing too much forest? Lazy dfs can help in some cases, but for bcc we have to make sure the lazily produced depth-first forest isn't shared with what's used to build an array.

I'm not sure I follow your comment. We do not have lazy dfs, so we get the full forest when we run dff. If we traverse it twice there is no extra memory cost. But if the question is whether we can avoid the time cost of traversing it twice, then we can think about it. It's possible to traverse it once but we need arbitrary dnum lookups when collecting, which means mutable arrays to keep the same complexity. If we go one step further and combine the dfs with collecting the result, that makes it very close to Tarjan's algorithm.

We can achieve semi-lazy dfs (lazy in preorder) using lazy ST.

Here's another thought for a place to start: can we get rid of the dnum array? Everything's a bit tangled, but it looks to me like we're almost certainly traversing the tree in depth-first order. So instead of checking an array to figure out where we are, we can keep track as we go. Something like

bicomps :: Int -> Tree Vertex -> (Int, Forest [Vertex])
collect :: Int -> Tree Vertex -> (Int, Int, ...)

Each function takes the current preorder index and returns the new one.

This is obviously ... unpleasant. That brings us back to the general DFS abstraction question.