gitabcworld / graph_db

Synthetic graph database generation.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Graph Database

Synthetic graph database generation. Each class is generated with a prototype and afterwards distortions are applied. To run the default example:

$ pip install -r requirements.txt
$ python generate_dataset.py

Usage

Usage of generate_dataset.py script:

usage: generate_dataset.py [-h] [--dirPrototypes DIRPROTOTYPES]
                           [--nodeThreshold NODETHRESHOLD]
                           [--dirDataset DIRDATASET] [--division DIVISION]
                           [--unbalanced] [--nodeDisplace NODEDISPLACE]
                           [--nodeAdd NODEADD] [--edgeMaximum EDGEMAXIMUM]
                           [--addEdge ADDEDGE] [--rmEdge RMEDGE]
                           [--edgeConnection EDGECONNECTION]

Generate a dataset from a given prototype folder.

optional arguments:
  -h, --help            show this help message and exit
  --dirPrototypes DIRPROTOTYPES
                        prototype folder
  --nodeThreshold NODETHRESHOLD
                        prototypes node threshold
  --dirDataset DIRDATASET
                        dataset folder
  --division DIVISION   division (tr, val, te)
  --unbalanced          Unbalanced database
  --nodeDisplace NODEDISPLACE
                        node std for distort its position
  --nodeAdd NODEADD     node std for adding a node in a source neighbourhood
  --edgeMaximum EDGEMAXIMUM
                        maximum number of new edges that can be added
  --addEdge ADDEDGE     probability to add new edge
  --rmEdge RMEDGE       probability to remove an edge
  --edgeConnection EDGECONNECTION
                        probability new edge is connected to an existing node

Prototypes

Prototypes folder contains prototypes to generate different dataset and also combinations:

$ --dirPrototypes ['./prototypes/Letters/', './prototypes/Digits/']

The proposed prototypes can be found here.

Parameter discussion

Evaluation on the effect of the proposed parameters.

Add nodes

Controlled by --nodeThreshold parameter, increase the number of nodes of the prototypes before the deformation. It tries to add a node at the specified distance, equispaced following the edges.

Some examples with graph A normalized before and after adding the nodes:

Original graph
--nodeThreshold Image --nodeThreshold Image
0.10 0.20
0.30 0.40

Node distortion

Controlled by --nodeDisplace parameter, add random noise following a normal distribution center at each node with standard deviation set by --nodeDisplace.

Some examples with graph A where --nodeThreshold has been set to 0.40.

Original graph
--nodeDisplace Image --nodeDisplace Image
0.01 0.05
0.10 0.20

Insert edges

Controlled by --edgeMaximum parameter, --addEdge, --edgeConnection and --nodeAdd, adds at most --edgeMaximum edges with probability --addEdge. The source node is always a existing node in the graph, the target node is an existing one with probability --edgeConnection. If a new node is add, it is created in a neighbourhood with standard deviation --nodeAdd.

Some examples with graph A where --nodeThreshold has been set to 0.40, --nodeDisplace 0.10, --edgeMaximum 10 and --nodeAdd 0.8.

Original graph
--addEdge --edgeConnection Image --addEdge --edgeConnection Image
0.05 0.75 0.05 0.50
0.10 0.75 0.10 0.50
0.25 0.75 0.25 0.50
0.50 0.75 0.50 0.50

Remove edge

Controlled by --rmEdge parameter, removes randomly edges with probability --rmEdge, however, at least one edge shall be kept.

Some examples with graph A where --nodeThreshold has been set to 0.40, --nodeDisplace 0.10, --edgeMaximum 10, --nodeAdd 0.8, --addEdge 0.1 and --edgeConnection 0.75.

Original graph
--rmEdge Image --rmEdge Image
0.01 0.05
0.10 0.20

Some Examples

Different levels of distortion for graph A with --nodeThreshold 0.4.

LOW

  • --nodeDisplace 0.05
  • --nodeAdd 0.4
  • --edgeMaximum 8
  • --addEdge 0.1
  • --rmEdge 0.05
  • --edgeConnection 0.75
Image Image Image

MEDIUM

  • --nodeDisplace 0.1
  • --nodeAdd 0.5
  • --edgeMaximum 10
  • --addEdge 0.1
  • --rmEdge 0.05
  • --edgeConnection 0.6
Image Image Image

HIGH

  • --nodeDisplace 0.2
  • --nodeAdd 0.8
  • --edgeMaximum 10
  • --addEdge 0.25
  • --rmEdge 0.05
  • --edgeConnection 0.6
Image Image Image

Authors

About

Synthetic graph database generation.

License:MIT License


Languages

Language:Python 100.0%