LineaRE

Source code for ICDM2020 research paper "LineaRE: Simple but Powerful Knowledge Graph Embedding for Link Prediction", a.k.a., LineaRE.
You can easily add your own model to the code framework.

Update!

We reorganized and optimized the code (/new code). The new version of the code has clearer logical structure and faster running speed, and supports multi GPUs parallel training to further accelerate the training speed.

Code

Running LineaRE is very easy, just:

put your arguments in the json files ./config/*.json , e.g. config_FB15k.json
execute command, python3 main.py

Code files

Totally six python files:

configure.py: including all hyper parameter, reading arguments from ./config/*.json ;
data.py: dataloader, a KG class containing all data in a dataset;
lineare.py: the implementation of the LineaRE model;
main.py: the entry of the whole program, creating a KG object, a TrainTest object, and start training and test;
traintest.py: receiving a KG object, a model, and the process of training and testing is described;
utils.py: some model independent tools.

Dependencies

Python 3
PyTorch 1.*
Numpy

Datasets

Four datasets: FB15k, WN18, FB15k-237, WN18RR. (the same as [1])

entities.dict: a dictionary map entities to unique ids;
relations.dict: a dictionary map relations to unique ids;
train.txt: the KGE model is trained to fit this data set;
valid.txt: create a blank file if no validation data is available;
test.txt: the KGE model is evaluated on this data set.

Parameters(./config/config_FB15k.json)

dim: the dimmention (size) of embeddings,
norm_p: L1-Norm or L2-Norm,
alpha: the temperature of Self-Adversarial Negative Sampling [1], Eq(3) in our paper,
beta: a hyper-parameter in softplus(pytorch), Eq(4),
gamma: a hyper-parameter in loss the function, Eq(5), used to separate the positive sample from the negative sample,
learning_rate: initial learning rate, decaying during training.
decay_rate: learning rate decay rate,
batch_size:
neg_size: the number of negative samples for each positive sample in an optimization step,
regularization: the regularization coefficient,
drop_rate: some dimensions of embeddings are randomly dropped with probability 'drop_rate',
test_batch_size:,
data_path: "/root/Drive/Datasets/FB15k",
save_path: "./save_path/FB15k",
max_step: total training steps,
valid_step: valid the model every 'valid_step' training steps,
log_step: logging the average loss value every 'log_step' training steps,
test_log_step: ,
optimizer: SGD, Adam ...,
init_checkpoint: whether load model from checkpoint file or not,
use_old_optimizer: if init_checkpoint is True, load the stored optimizer or use a new optimizer,
sampling_rate: assign a weight for each triple, like word2vec,
sampling_bias: assign a weight for each triple, like word2vec,
device: 'cuda:0', cpu...,
multiGPU: use multiple GPUs?

Citation

If you use this model or code, please cite it as follows:

@inproceedings{peng2020lineare,
  author    = {Yanhui Peng and Jing Zhang},
  editor    = {Claudia Plant and Haixun Wang and Alfredo Cuzzocrea and Carlo Zaniolo and Xindong Wu},
  title     = {LineaRE: Simple but Powerful Knowledge Graph Embedding for Link Prediction},
  booktitle = {IEEE International Conference on Data Mining, {ICDM}},
  pages     = {422--431},
  year      = {2020},
  url       = {https://ieeexplore.ieee.org/document/9338434}
}

References

[1] RotatE

pengyanhui / LineaRE