Learning Dynamic Context Augmentation (DCA) for Global Entity Linking

Original Readme

This is a refactor version of DCA and only contains the basic functionality to reproduce the experiments excluding all ablation studies.

Prepare

Data and some other sources: data.zip
```
unzip data.zip
```
Install python packages
```
pip install -r requirements.txt
```

[Optional] Preparing New Candidates

Generate prior

python preprocess_prior.py \
    --wiki_preprocess preprocess_2021candidate \
    --cpu 36

Correct some wrong entity in prior

python preprocess_priormapping.py \
    --wiki_preprocess preprocess_2021candidate \
    --cpu 36

Generate candidates

python preprocess_prior2candidate.py \
    --wiki_preprocess preprocess_2021candidate

Preparing Entity Pretraining

Generate vocabulary

python preprocess_vocab.py \
    --wiki_preprocess preprocess_2014candidate \
    --dataroot data_2014_2021dca

Generate entity word co-occurrence

python preprocess_entitypage.py \
    --wiki_preprocess preprocess_2014candidate \
    --dataroot data_2014_2014dca \
    --cpu 36

Generate inlinks from wiki dump

python preprocess_inlinks.py \
    --wiki_preprocess preprocess_2014candidate \
    --dataroot data_2014_2014dca \
    --cpu 36

Change both --wiki_preprocess and --dataroot to preprocess_2021candidate for entity pre-training on new candidates

Entity Pretraining

Train on old candidate

python train_entity_embedding.py \
    --dataroot ./data_2014_2021dca \
    --wiki_preprocess ./preprocess_2014candidate \
    --logdir ./logs/2014_2021entitypretrain

Train on new candidate

python train_entity_embedding.py \
    --dataroot ./data_2014_2021dca \
    --wiki_preprocess ./preprocess_2021candidate \
    --logdir ./logs/2021_2021entitypretrain

DCA Training

Supervised Learning

python main.py --mode train --method SL --logdir ./logs/SL

Supervised Learning

python main.py --mode train --method RL --logdir ./logs/RL

Notes

Gold recall after preranking

Name: aida-train, #batch: 953, #mention: 18258, recall: 1.000000
Name: aida-A    , #batch: 218, #mention:  4791, recall: 0.977249
Name: aida-B    , #batch: 232, #mention:  4485, recall: 0.986622
Name: msnbc     , #batch:  20, #mention:   656, recall: 0.984756
Name: aquaint   , #batch:  50, #mention:   727, recall: 0.940853
Name: ace2004   , #batch:  35, #mention:   257, recall: 0.914397
Name: clueweb   , #batch: 320, #mention: 11154, recall: 0.919042
Name: wikipedia , #batch: 319, #mention:  6821, recall: 0.932708

Citation

If you find the implementation useful, please cite the original paper: Learning Dynamic Context Augmentation for Global Entity Linking.

@inproceedings{yang2019learning,
  title={Learning Dynamic Context Augmentation for Global Entity Linking},
  author={Yang, Xiyuan and Gu, Xiaotao and Lin, Sheng and Tang, Siliang and Zhuang, Yueting and Wu, Fei and Chen, Zhigang and Hu, Guoping and Ren, Xiang},
  booktitle = {Proceedings of EMNLP-IJCNLP},
  year={2019}
}

w86763777 / DCA