haskell / containers

graphFromEdges is one way to construct a Graph in Data.Graph, and the line in it which actually constructs the graph looks like

containers/containers/src/Data/Graph.hs

Line 450 in d8d163a

    
           graph           = array bounds0 [(,) v (mapMaybe key_vertex ks) | (,) v (_,    _, ks) <- edges1]

This is very lazy. The elements of the array (lists of vertices) are lazy, and the lists themselves would be lazily generated when required. I doubt building up all these thunks is good for us. So I checked out if construction times improve if we are strict, and it does, from 371 ms ± 18 ms, 107 MB allocated to 323 ms ± 30 ms, 77 MB allocated on the largest graph.

Now I've tried to think of situations where lazily constructing a graph is useful.
The only case I can think of is if the user constructs a large graph through graphFromEdges, then runs dfs on a subset of the graph the user already knows is not connected to the rest of the graph. Then they avoid paying the cost of constructing the full graph. But this seems far-fetched.
All other functions like dff, topSort, scc, bcc will always evaluate the full graph.

So, is there any other scenario where lazily constructing a graph is useful?

And if not, should we make it strict?
This would improve the times and also make it consistent with buildG, the other way in Data.Graph to build a graph. The second reason alone might be good reason to do this.

I'm a bit skeptical about this one, though I won't rule it out entirely. Would it be possible to speed it up without being eager about those binary searches? I would start with these lines:

  edges1          = zipWith (,) [0..] sorted_edges

    graph           = array bounds0 [(,) v (mapMaybe key_vertex ks) | (,) v (_,    _, ks) <- edges1]
    key_map         = array bounds0 [(,) v k                       | (,) v (_,    k, _ ) <- edges1]
    vertex_map      = array bounds0 edges1

We build three arrays using the same list, edges1. I think this might be a spot where it'll pay to get lower level with arrays, and build all three "in parallel". Another option (which I'm guessing will be slower and harder to optimize) would be to build vertex_map and then build the other two arrays based on it.

That's interesting, I'll test it out. One thing I have tried is replacing array with listArray, the zipWith (,) [0..] is avoidable, which makes it a little faster.

More concretely, use this building block:

array3 :: Ix i => (i, i) -> [(i, a, b, c)] -> (Array i a, Array i b, Array i c)

You'll probably need a bunch of inline pragmas to make things work nicely, including likely one on that zip.

(Or maybe you can avoid the zip with an accumulator? I dunno.)

Is graphFromEdges too lazy?