GEDGNN

The code repository of paper "Computing Graph Edit Distance via Neural Graph Matching" for PVLDB 2023. This figure gives an overview of our GEDGNN model.

Core Code

We implement GEDGNN and other baseline machine learning models in models.py and GedMatrix.py. Import these models by:

from models import GPN, SimGNN, GedGNN, TaGSim

The $k$-best matching algorithm with GED lower bound pruning is implemented in kbest_matching_with_lb.py. Import it by:

from kbest_matching_with_lb import KBestMSolver

Project Running Guidelines

To reproduce the experimental results in our paper, please refer to the folder experiments/ which contains three projects (Overall Performance, Path Results and Powerlaw Graphs). The first two projects correspond to the experiments on real graph data sets in Table 3, Figure 3 and Figure 5 of the paper, and the last one corresponds to the experiments on large-scale synthetic power-law graphs in Figure 4 of the paper. They share the same core code but have slight differences in ground-truth generation and evaluation metrics.

Taking Overall Performance as an example, the content of a project is shown in the figure below.

Requirements

All projects can be run on a Windows system without GPU. To run them on Linux or train GEDGNN with GPU, minor adjustments on source code are necessary.

Install python3.8 and other packages as specified in requirements.txt.

dgl==0.7.0
matplotlib==3.3.4
networkx==2.5
numpy==1.20.1
scipy==1.6.2
texttable==1.6.4
torch==1.8.2+cpu
torch_geometric==2.0.4
tqdm==4.59.0

Training, Testing and Post-processing

Under the Overall Performance directory, we can train GEDGNN for 20 epochs and test the corresponding models using:

python src/main.py --model-name GedGNN --dataset AIDS --model-epoch-start 0 --model-epoch-end 20 --model-train 1

Run post-processing using the 20-th epoch model by:

python src/main.py --model-name GedGNN --dataset AIDS --model-epoch-start 20 --model-epoch-end 20 --model-train 0

AIDS for the parameter dataset can be replaced by other data sets Linux and IMDB.

GedGNN for the parameter model-name can be replaced by baseline models SimGNN, GPN and TaGSim.

The way of running Path Results and Powerlaw Graphs projects is the same. The detailed running parameters for training, testing and post-processing can be found in arg.txt under each directory.

A Running Demo

In the following we present a running demo of Overall Performance.

Firstly, we train GEDGNN for 2 epochs.

python src/main.py --model-name GedGNN --dataset AIDS --model-epoch-start 0 --model-epoch-end 2 --model-train 1 --demo

Note: The parameter --demo can be used for a quick demo run. In this case, only a small number of training and testing graph pairs are used. Please do not use --demo in formal evaluations.

In each epoch, the project will do the following things sequentially.

Train the model for 1 epoch, and then output the training results.
Store the latest model under model_save/. Note that all historical models are stored as AIDS_1, AIDS_2, ... for further evaluation.
Test the latest model, and then output the testing results. Note that in this step we merely test the GED value prediction generated by the machine learning model. For GEDGNN, the results denote the performance of GEDGNN-value.

Secondly, we run the post-processing algorithm using AIDS_2 (the model by the final epoch).

python src/main.py --model-name GedGNN --dataset AIDS --model-epoch-start 2 --model-epoch-end 2 --model-train 0 --demo

The results of post-processing denote the performance of GEDGNN-matching. Recall that an edit path is generated by the post-processing algorithm, and the predicted GED value is the length of this path.

All results are sequentially output into result/results.txt. A sample result file is shown below which depicts the format of output.

ChengzhiPiao / GEDGNN