This is a refactor version of DCA and only contains the basic functionality to reproduce the experiments excluding all ablation studies.
- Data and some other sources: data.zip
unzip data.zip
- Install python packages
pip install -r requirements.txt
-
Generate prior
python preprocess_prior.py \ --wiki_preprocess preprocess_2021candidate \ --cpu 36
-
Correct some wrong entity in prior
python preprocess_priormapping.py \ --wiki_preprocess preprocess_2021candidate \ --cpu 36
-
Generate candidates
python preprocess_prior2candidate.py \ --wiki_preprocess preprocess_2021candidate
-
Generate vocabulary
python preprocess_vocab.py \ --wiki_preprocess preprocess_2014candidate \ --dataroot data_2014_2021dca
-
Generate entity word co-occurrence
python preprocess_entitypage.py \ --wiki_preprocess preprocess_2014candidate \ --dataroot data_2014_2014dca \ --cpu 36
-
Generate inlinks from wiki dump
python preprocess_inlinks.py \ --wiki_preprocess preprocess_2014candidate \ --dataroot data_2014_2014dca \ --cpu 36
Change both --wiki_preprocess
and --dataroot
to preprocess_2021candidate
for entity pre-training on new candidates
-
Train on old candidate
python train_entity_embedding.py \ --dataroot ./data_2014_2021dca \ --wiki_preprocess ./preprocess_2014candidate \ --logdir ./logs/2014_2021entitypretrain
-
Train on new candidate
python train_entity_embedding.py \ --dataroot ./data_2014_2021dca \ --wiki_preprocess ./preprocess_2021candidate \ --logdir ./logs/2021_2021entitypretrain
- Supervised Learning
python main.py --mode train --method SL --logdir ./logs/SL
- Supervised Learning
python main.py --mode train --method RL --logdir ./logs/RL
- Gold recall after preranking
Name: aida-train, #batch: 953, #mention: 18258, recall: 1.000000 Name: aida-A , #batch: 218, #mention: 4791, recall: 0.977249 Name: aida-B , #batch: 232, #mention: 4485, recall: 0.986622 Name: msnbc , #batch: 20, #mention: 656, recall: 0.984756 Name: aquaint , #batch: 50, #mention: 727, recall: 0.940853 Name: ace2004 , #batch: 35, #mention: 257, recall: 0.914397 Name: clueweb , #batch: 320, #mention: 11154, recall: 0.919042 Name: wikipedia , #batch: 319, #mention: 6821, recall: 0.932708
If you find the implementation useful, please cite the original paper: Learning Dynamic Context Augmentation for Global Entity Linking.
@inproceedings{yang2019learning,
title={Learning Dynamic Context Augmentation for Global Entity Linking},
author={Yang, Xiyuan and Gu, Xiaotao and Lin, Sheng and Tang, Siliang and Zhuang, Yueting and Wu, Fei and Chen, Zhigang and Hu, Guoping and Ren, Xiang},
booktitle = {Proceedings of EMNLP-IJCNLP},
year={2019}
}