frareba / MultiClassBlockGraphGenerator

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Multi-class version of block graph generator

Version of the block graph generator that can produce datasets with more than two classes.

Usage

The script produces output in the file format that is used by the TUDataset.

Use it on the command line with the following arguments

Multiclass Blockgraph Dataset Generator

optional arguments:
  -h, --help   show this help message and exit
  --name NAME  Name of output file(s)
  --N N        Number of graphs per class
  --Nc Nc      Number of classes
  --n N        Number of blocks
  --c C        Size of blocks
  --p P        Edge probability
  --m M        Number of noise edges
  --seed SEED  Random seed

Note that for certain parameter combinations, the script might not terminate.

Original description:

Stochastic Block Model Graphs

We consider graphs generated by the stochastic block model (Wang and Wong, 1987). The specific generation of graphs is described by the following process: Let $T$ be some random tree of a predefined size. Create two (non-isomorphic) graphs $G_1,G_2$ by adding a new edge to $T$. To ensure that the classification task is non-trivial, we require that $G_1$ and $G_2$ have the same multiset of vertex degrees. We then generate a set of graphs for each $G \in G_1,G_2$ by repeating the following process:

  1. Let $\widehat{G}$ be the empty graph.
  2. For each $v \in G$, add a set of $c$ vertices $v_1,...,v_c$ to $\widehat{G}$.
  3. For each $v \in G$, connect pairs ${v_i,v_j} \in V(\widehat{G})$ by an edge with probability $p$.
  4. For all pairs ${v_i,u_j} \in V(\widehat{G})$ with ${v,u} \in E(G)$, connect them by an edge with probability $p$, as well.
  5. Connect a number of $m_x$ prior unconnected vertex pairs ${v_i,u_j}$ in $\widehat{G}$ as noise edges.

The resulting classification task is to assign the generated graph $\widehat{G}$ to the underlying graph structure $G \in G_1,G_2$. The figure depicts an example of two such generated graphs.

Graphs generated by slightly different underlying structures (depicted in grey). Each block in the underlying structure contains 3 vertices. Two vertices are connected by an edge with some probability p if they belong to the same block or their blocks are connected in the underlying structure (depicted by solid black lines). Furthermore, both graphs additionally contain mx = 4 noise edges (depicted by dashed red lines)

All datasets considered in the evaluation of (Schulz et al., 2022) were created starting with a random tree of size $16$ which was extended by a single random edge resulting in graphs $G_1,G_2$. For each classification task we generated $200$ random graphs for each $G \in G_1,G_2$. The number of vertices $c$ contained in a block was set to $8$ in all experiments. For each set of parameter choices, we generated $5$ datasets and provide the mean accuracy.

References

  • Yuchung J. Wang and George Y. Wong (1987): Stochastic blockmodels for directed graphs. Journal of the American Statistical Association, 82(397), 8–19.
  • Till Schulz, Tamas Horvath, Pascal Welke, Stefan Wrobel (2022): A Generalized Weisfeiler-Lehman Graph Kernel. Machine Learning. https://doi.org/10.1007/s10994-022-06131-w

Installation and Requirements

The file in this repository requires the networkx library and can then be used as a standalone script.

Usage

The script produces output in the file format that is used by the TUDataset.

Use it on the command line with the following arguments

Blockgraph Dataset Generator

optional arguments:
  -h, --help   show this help message and exit
  --name NAME  Name of output file(s)
  --N N        Number of graphs per class
  --n N        Number of blocks
  --c C        Size of blocks
  --p P        Edge probability
  --m M        Number of noise edges
  --seed SEED  Random seed

Reference

If you use this generator in any of your work, please cite our article:

Till Schulz, Tamas Horvath, Pascal Welke, Stefan Wrobel: A generalized Weisfeiler-Lehman graph kernel. Mach Learn (2022). https://doi.org/10.1007/s10994-022-06131-w

About

License:MIT License


Languages

Language:Python 100.0%