Graph partitioning interface

Question

Graph partitioning interface

mtfishman opened this issue 2 years ago · comments

Currently we have functions:

using NamedGraphs
using ITensors
using ITensorNetworks
using Metis

tn = ITensorNetwork(named_grid((4, 4)); link_space=4)
partition(tn, 2)
subgraphs(tn, nv(tn) ÷ 2)

for partitioning graphs, which output:

julia> partition(tn, 2)
16-element Dictionaries.Dictionary{Tuple{Int64, Int64}, Int64}
 (1, 1) │ 1
 (2, 1) │ 1
 (3, 1) │ 1
 (4, 1) │ 1
 (1, 2) │ 1
 (2, 2) │ 1
 (3, 2) │ 1
 (4, 2) │ 1
 (1, 3) │ 2
 (2, 3) │ 2
 (3, 3) │ 2
 (4, 3) │ 2
 (1, 4) │ 2
 (2, 4) │ 2
 (3, 4) │ 2
 (4, 4) │ 2

julia> subgraphs(tn, nv(tn) ÷ 2)
DataGraph{Int64, Vector{Tuple{Int64, Int64}}, Any, NamedGraph{Int64}, NamedEdge{Int64}} with 2 vertices:
2-element Vector{Int64}:
 1
 2

and 1 edge(s):
1 => 2

with vertex data:
2-element Dictionaries.Dictionary{Int64, Vector{Tuple{Int64, Int64}}}
 1 │ [(1, 1), (2, 1), (3, 1), (4, 1), (1, 2), (2, 2), (3, 2), (4, 2)]
 2 │ [(1, 3), (2, 3), (3, 3), (4, 3), (1, 4), (2, 4), (3, 4), (4, 4)]

and edge data:
0-element Dictionaries.Dictionary{NamedEdge{Int64}, Any}

We should make them consistent so that they both can accept the partition size or the number of partitions. Additionally, their names should give a better indication for what they output.

For example, I could imagine a function:

julia> partition_vertices(tn, 2)
2-element Vector{Vector{Tuple{Int64, Int64}}}:
 [(1, 1), (2, 1), (3, 1), (4, 1), (1, 2), (2, 2), (3, 2), (4, 2)]
 [(1, 3), (2, 3), (3, 3), (4, 3), (1, 4), (2, 4), (3, 4), (4, 4)]

julia> partition_vertices(tn; partition_size=nv(tn) ÷ 2)
2-element Vector{Vector{Tuple{Int64, Int64}}}:
 [(1, 1), (2, 1), (3, 1), (4, 1), (1, 2), (2, 2), (3, 2), (4, 2)]
 [(1, 3), (2, 3), (3, 3), (4, 3), (1, 4), (2, 4), (3, 4), (4, 4)]

Or perhaps partition_vertices should output a DataGraph like subgraphs does right now, which also keeps track of which partitions have connected edges.

@JoeyT1994 curious about your opinion on this.

Miles · Answer 1 · Wed Dec 21 2022 06:19:21 GMT+0800 (China Standard Time)

Agreed the new function interface is better. Would it make sense in the future to allow partitioning by properties too? Like a keyword argument by=... that takes a function that determines which partition to put a vertex into? (Only if there's a use case for it of course.)

Matt Fishman · Answer 2 · Wed Dec 21 2022 07:52:20 GMT+0800 (China Standard Time)

I think that kind of thing would be better handled externally by applying functions like SplitApplyCombine.group to the vertices vertices(tn): https://github.com/JuliaData/SplitApplyCombine.jl#grouping.

The functions I'm discussing above are more focused on graph partitioning based on graph properties, and wrap functionality from packages like Metis.jl and KaHyPar.jl.

Matt Fishman · Answer 3 · Wed Dec 21 2022 07:57:42 GMT+0800 (China Standard Time)

I could imagine a functionality like:

subgraph_vertices = [
  [(1, 1), (2, 1), (3, 1), (4, 1), (1, 2), (2, 2), (3, 2), (4, 2)],
  [(1, 3), (2, 3), (3, 3), (4, 3), (1, 4), (2, 4), (3, 4), (4, 4)],
]
dg = subgraph_connectivity(tn, subgraph_vertices)

which outputs a graph with edges indicating which subgraphs are connected to each other (like the output of the currently available subgraphs function I show above), in this case an edge 1 => 2 indicating subgraph 1 has at least one edge connected to subgraph 2. The subgraph_vertices input could then be determined using functions like partition_vertices or SplitApplyCombine.group.

Joseph Tindall · Answer 4 · Wed Dec 21 2022 22:58:21 GMT+0800 (China Standard Time)

Yes I think that a `partition_vertices' function which allows the user to choose partition_number vs partition_size would be helpful.

@emstoudenmire, there are a few cases from BP I can think of where partitioning by properties would be helpful..

For instance if we want to do belief propagation over a wave function tn but don't want to explicitly flatten the network due to memory overhead we can form Z = <tn|tn> without actually doing the contraction via

It is then logical to ensure those ITensors in Z which share a site index are placed into the same subgraphs and do belief propagation that way.

Additionally if we do want to efficiently do a Belief Propagation version of DMRG we would need to form (but not contract) <tn|H|tn> and again want to keep all ITensors which share site inds in the same subgraphs.

@mtfishman In the case of Simple Belief Propagation this seems quite easily done with SplitApplyCombine.group (as those tensors which share a site index have all elements of their vertex tuple identical - other than the first). What, however, if I wanted to have larger subgraphs for GBP (e.g. 2*n ITensors of Z in each subgraph, each ITensor must share the subgraph with its conjugate ITensor, and the network of ITensors formed within each subgraph must be connected)?

Perhaps we would need to write up a function for that?

With this functionality the code we have written in beliefpropagation.jl can be used to form SBP or GBP message tensors of such objects without needing any keyword arguments itself, as all of the subgraphing is taken care of elsewhere which is neat.

Matt Fishman · Answer 5 · Thu Dec 22 2022 00:08:25 GMT+0800 (China Standard Time)

Good example, I think it hits on the question of how to do graph partitioning with constraints on which vertices will get grouped together, or another way of seeing it is partitioning a graph that has already been partitioned (i.e. how to work with and represent partitioned graphs).

Here is a solution with current tools that are available, where we do it in two passes, one where you first group the vertices you know you always want grouped together, and next you partition the resulting graph:

using ITensors
using Graphs
using NamedGraphs
using ITensorNetworks
using SplitApplyCombine
using Metis

using ITensorNetworks: findfirst_on_vertices

function subgraph_connectivity(g, vertex_groups)
  dg_subgraphs = DataGraph(NamedGraph(keys(vertex_groups)), vertex_groups)
  for e in edges(g)
    s1 = findfirst_on_vertices(vertex_groups -> src(e) ∈ vertex_groups, dg_subgraphs)
    s2 = findfirst_on_vertices(vertex_groups -> dst(e) ∈ vertex_groups, dg_subgraphs)
    if (!has_edge(dg_subgraphs, s1, s2) && s1 != s2)
      add_edge!(dg_subgraphs, s1, s2)
    end
  end
  return dg_subgraphs
end

s = siteinds("S=1/2", named_grid(8))
tn = ITensorNetwork(s; link_space=2)
Z = prime(tn; sites=[]) ⊗ tn

# Group the vertices by the first dimension, so pairs vertices up like:
# [[(1, 1), (1, 2)], [(2, 1), (2, 2)], [(3, 1), (3, 2)], ...]
vertex_groups = group(v -> v[1], vertices(Z))

# Uses the function defined above to turn the vertex groups into a DataGraph
# which encodes which vertex groups have edges connected between them
# Probably to provide more information for partitioning, it should store how
# many of the original edges are connected
Z_subgraphs = subgraph_connectivity(Z, vertex_groups)

# Partition the partitioned graph
Z_partitions = subgraphs(Z_subgraphs, 2)

# Group the vertices in terms of the original vertices of `Z`
Z_verts = [reduce(vcat, (Z_subgraphs[p] for p in Z_partitions[v])) for v in vertices(Z_partitions)]

which returns:

4-element Vector{Vector{Tuple{Int64, Int64}}}:
 [(1, 1), (1, 2), (2, 1), (2, 2)]
 [(3, 1), (3, 2), (4, 1), (4, 2)]
 [(5, 1), (5, 2), (6, 1), (6, 2)]
 [(7, 1), (7, 2), (8, 1), (8, 2)]

This is a bit verbose and opaque right now, but I think with some helper functions and improvements to the partitioning interface we can make it nicer. I would prefer a solution for this kind of problem which involves composing various well defined and simple tools instead of making one big function that tries to do everything, since you can imagine there may be many variations that people will want.

This points to an original question I had about designing this package and the graph types, which was that it seemed natural to want to make use of partitioned graphs, for example a tensor network with multiple tensors grouped together and treated as a single vertex. I had actually originally designed a PartitionedGraph type but found that it was unwieldy and too complicated to design. I think that DataGraph can handle partitioned graphs quite naturally, since ultimately partitioned graphs can be viewed as graphs that store subgraphs in the metadata of the vertices (or information defining subgraphs, like subgraph vertex groups). Additionally, the vertex names of a NamedGraph can give some human-level information about the connectivity of the graph, which can help with partitioning as well, like in the example above.

Joseph Tindall · Answer 6 · Thu Dec 22 2022 01:23:05 GMT+0800 (China Standard Time)

Hi Matt,

Thanks to the solution to that problem. That is really neat and I agree that writing some helper functions that can turn it into a few lines when it is needed to be used mid-program would be perfect.

Do you think such functions should go into subgraphs.jl? For instance one could be subgraph_connectivity as you wrote above, with the extra that the edge data is number of edges from the original graph actually involved.

Tomorrow, I could add that to subgraphs.jl and make a PR if you wish?

Matt Fishman · Answer 7 · Sat Dec 24 2022 02:51:16 GMT+0800 (China Standard Time)

Closed by #30.