Heterogeneous Graph Neural Network with Hypernetworks for Knowledge Graph Embedding

Overview

The source code for HKGN: Heterogeneous Graph Neural Network with Hypernetworks for Knowledge Graph Embedding ISWC 22.

├── model
│   ├── __init__.py
│   ├── encoder_decoder.py    The overall HKGN model.
│   ├── gcn_encoder.py        HGNN encoder.
│   └── hyper_conv_layer.py   Multi-relational graph convolution.
└── run.py  Root script for running the project.

Data preprocessing

Unzip the compressed datasets.

mkdir data
unzip data_compressed/FB15k-237.zip -d data/
unzip data_compressed/WN18RR.zip -d data/

Then you will get all the essential data to reproduce the results reported in the paper.

FB15k-237

data
├── FB15k-237
Original data:
│   └── train.txt
│   ├── valid.txt
│   ├── test.txt
Subsets divided by relation categories:
│   ├── 1-1.txt
│   ├── 1-n.txt
│   ├── n-1.txt
│   ├── n-n.txt
Subsets divided by entity degrees:
│   ├── ent100.txt  [0, 100)
│   ├── ent200.txt  [100, 200)
│   ├── ent300.txt  [200, 300)
│   ├── ent400.txt  [300, 400)
│   ├── ent500.txt  [400, 500)
│   ├── ent1000.txt [500, 1000)
│   ├── entmax.txt  [1000, max)

WN18RR

├──WN18RR
Original data:
    └── train.txt
    ├── valid.txt
    ├── test.txt
Subsets divided by entity degrees:
    ├── ent10.txt  [0, 10)
    ├── ent25.txt  [10, 25)
    ├── ent50.txt  [25, 50)
    ├── ent100.txt [50, 100)
    ├── ent500.txt [100, max)

Dependencies

Python 3.x
torch-scatter (make sure to be compatible with your own pytorch version, please refer to torch-scatter)
tqdm
PyTorch >= 1.5.0
ordered-set
numpy

Dependencies can be installed using requirements.txt.

Training the model

To reproduce the best performance we have found:

# default dataset: FB15k-237
# FB25k-237 layer 2
python run.py -name test_fb_layer2 -gcn_drop 0.4 -model hyper_gcn -gpu 0 -exp hyper_mr_parallel -gcn_layer 2 -layer2_drop 0.2 -layer1_drop 0.3
# FB15k-237 layer 1
python run.py -name test_fb_layer1 -gcn_drop 0.4 -model hyper_gcn -gpu 0 -exp hyper_mr_parallel
# WN18RR layer 1
python run.py -name test_wn18rr_layer1 -gcn_drop 0.4 -model hyper_gcn -batch 256 -gpu 0 -data WN18RR

The detailed hyperparameters:

Hyperparameters	FB15k-237	WN18RR
$d_x$	100	100
$d_y$	2	2
$d_z$	100	100
Relaional kernel size	3x3	3x3
Number of Relaional filters	32 (layer 1) / 16 (layer 2)	32
HKGN layers	2	1
Initial entity embedding size	100	100
Final entity embedding size	200	200
Message dropout	0.4	0.4
Batch size	1024	256
Learning rate	0.001	0.001
Lable smoothing	0.1	0.1

Customize the training strategies (parallel and iterative) for multi-relational knowledge graphs by setting the argument -exp:

hyper_mr_parallel: Performing all single-relational graph convolution simultaneously (default). High GPU memory footprints and quick training speed.
hyper_mr_iter: Performing different single-relational graph convolution iteratively. Lower GPU memory requirements and slower training speed.

Print the maximum GPU memory allocated by setting the argument -log_gpu_mem.

If you got out of memory error, try to clear cached memory by passing the argument: -empty_gpu_cache.

The learned model will be automatically saved in directory /checkpoints, pass the argument -restore to resume your saved model, e.g.:

-restore -name your_saved_model_name

Choose specific data file as the test set by setting -test_data.

-test_data test(defalut)/1-n/n-1/.../ent100/..

Distribution statistics

Statistics of different relation categories on FB15k-237:

	1-1	1-N	N-1	N-N
#Relations	17	26	86	108
#Training	4278	12536	50635	204666
#Validation	167	1043	3936	12389
#Test	192	1293	4508	14473

Statistics of different degree scopes on FB15k-237:

Enitity degree scopes	#Entities	#Test
[0, 100)	13839	16385
[100, 200)	496	2055
[200, 300)	76	525
[300, 400)	44	493
[400, 5000)	35	389
[500, 1000)	35	373
[1000, max)	16	246

Statistics of different degree scopes on WN18RR:

Enitity degree scopes	#Entities	#Test
[0, 10)	38102	2595
[10, 25)	2497	417
[25, 50)	243	47
[50, 100)	65	29
[100, 500)	36	46

Acknowledgment

Our project is based on COMPGCN

liuxiyang641 / HKGN