THUDM / SelfKG

Codes for WWW2022 accepted paper: SelfKG: Self-Supervised Entity Alignment in Knowledge Graphs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

License

SelfKG: Self-Supervised Entity Alignment in Knowledge Graphs

Original implementation for paper SelfKG: Self-Supervised Entity Alignment in Knowledge Graphs.

This paper is accepted and nominated as a best paper by The Web Conference2022! πŸ˜†

SelfKG is the first self-supervised entity alignment method without label supervision, which can match or achieve comparable results with state-of-the-art supervised baselines. The performance of SelfKG suggests self-supervised learning offers great potential for entity alignment in Knowledge Graphs.

SelfKG: Self-Supervised Entity Alignment in Knowledge Graphs

https://doi.org/10.1145/3485447.3511945

Installation

Requirements

torch==1.9.0
faiss-cpu==1.7.1
numpy==1.19.2
pandas==1.0.5
tqdm==4.61.1
transformers==4.8.2
torchtext==0.10.0

You can use setup.sh to set up your Anaconda environment by

bash setup.sh

Quick Start

Data Preparation

You can download the our data from here, and the final structure our project should be:

β”œβ”€β”€ data
β”‚   β”œβ”€β”€ DBP15K
β”‚   β”‚   β”œβ”€β”€ fr_en
β”‚   β”‚   β”œβ”€β”€ ja_en
β”‚   β”‚   └── zh_en
β”‚   β”œβ”€β”€ DWY100K
β”‚   β”‚   β”œβ”€β”€ dbp_wd
β”‚   β”‚   └── dbp_yg
β”‚   └── LaBSE
β”‚       β”œβ”€β”€ bert_config.json
β”‚       β”œβ”€β”€ bert_model.ckpt.index
β”‚       β”œβ”€β”€ checkpoint
β”‚       β”œβ”€β”€ config.json
β”‚       β”œβ”€β”€ pytorch_model.bin
β”‚       └── vocab.txt
β”‚   └── getdata.sh
β”œβ”€β”€ loader
β”œβ”€β”€ model
β”œβ”€β”€ run.sh # Please use this bash to run the experiments!
β”œβ”€β”€ run_DWY_LaBSE_neighbor.py # SelfKG on DWY100k
β”œβ”€β”€ run_LaBSE_neighbor.py # SelfKG on DBP15k
... # run_LaBSE_*.py # Ablation code will be available soon
β”œβ”€β”€ script
β”‚   └── preprocess
β”œβ”€β”€ settings.py
└── setup.sh # Can be used to set up your Anaconda environment

You can also use the following scripts to download the datasets directly:

cd data
bash getdata.sh # The download speed is decided by your network connection. If it's pretty slow, please directly download the datasets from the website as mentioned before.

⭐Run Experiments

Please use

bash run.sh

to reproduce our experiments results. For more details, please refer to run.sh and our code.

❗ Common Issues

"XXX file not found"
Please make sure you've downloaded all the dataset according to README.

to be continued ...

Citing SelfKG

If you use SelfKG in your research or wish to refer to the baseline results, please use the following BibTeX.

@article{DBLP:journals/corr/abs-2203-01044,
  author    = {Xiao Liu and
               Haoyun Hong and
               Xinghao Wang and
               Zeyi Chen and
               Evgeny Kharlamov and
               Yuxiao Dong and
               Jie Tang},
  title     = {SelfKG: Self-Supervised Entity Alignment in Knowledge Graphs},
  journal   = {CoRR},
  volume    = {abs/2203.01044},
  year      = {2022},
  url       = {https://arxiv.org/abs/2203.01044},
  eprinttype = {arXiv},
  eprint    = {2203.01044},
  timestamp = {Mon, 07 Mar 2022 16:29:57 +0100},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2203-01044.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

About

Codes for WWW2022 accepted paper: SelfKG: Self-Supervised Entity Alignment in Knowledge Graphs

License:MIT License


Languages

Language:Python 98.7%Language:Shell 1.3%