Is graphFromEdges too lazy?
meooow25 opened this issue · comments
graphFromEdges
is one way to construct a Graph
in Data.Graph
, and the line in it which actually constructs the graph looks like
containers/containers/src/Data/Graph.hs
Line 450 in d8d163a
This is very lazy. The elements of the array
(lists of vertices) are lazy, and the lists themselves would be lazily generated when required. I doubt building up all these thunks is good for us. So I checked out if construction times improve if we are strict, and it does, from 371 ms ± 18 ms, 107 MB allocated
to 323 ms ± 30 ms, 77 MB allocated
on the largest graph.
Now I've tried to think of situations where lazily constructing a graph is useful.
The only case I can think of is if the user constructs a large graph through graphFromEdges
, then runs dfs
on a subset of the graph the user already knows is not connected to the rest of the graph. Then they avoid paying the cost of constructing the full graph. But this seems far-fetched.
All other functions like dff
, topSort
, scc
, bcc
will always evaluate the full graph.
So, is there any other scenario where lazily constructing a graph is useful?
And if not, should we make it strict?
This would improve the times and also make it consistent with buildG
, the other way in Data.Graph
to build a graph. The second reason alone might be good reason to do this.
I'm a bit skeptical about this one, though I won't rule it out entirely. Would it be possible to speed it up without being eager about those binary searches? I would start with these lines:
edges1 = zipWith (,) [0..] sorted_edges
graph = array bounds0 [(,) v (mapMaybe key_vertex ks) | (,) v (_, _, ks) <- edges1]
key_map = array bounds0 [(,) v k | (,) v (_, k, _ ) <- edges1]
vertex_map = array bounds0 edges1
We build three arrays using the same list, edges1
. I think this might be a spot where it'll pay to get lower level with arrays, and build all three "in parallel". Another option (which I'm guessing will be slower and harder to optimize) would be to build vertex_map
and then build the other two arrays based on it.
That's interesting, I'll test it out. One thing I have tried is replacing array
with listArray
, the zipWith (,) [0..]
is avoidable, which makes it a little faster.
More concretely, use this building block:
array3 :: Ix i => (i, i) -> [(i, a, b, c)] -> (Array i a, Array i b, Array i c)
You'll probably need a bunch of inline pragmas to make things work nicely, including likely one on that zip.
(Or maybe you can avoid the zip with an accumulator? I dunno.)