w86763777 / DCA

This is a refactored code base of Learning Dynamic Context Augmentation (DCA) for Global Entity Linking.

Home Page:https://arxiv.org/abs/1909.02117

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Learning Dynamic Context Augmentation (DCA) for Global Entity Linking

Original Readme

This is a refactor version of DCA and only contains the basic functionality to reproduce the experiments excluding all ablation studies.

Prepare

  • Data and some other sources: data.zip
    unzip data.zip
    
  • Install python packages
    pip install -r requirements.txt
    

[Optional] Preparing New Candidates

  1. Generate prior

    python preprocess_prior.py \
        --wiki_preprocess preprocess_2021candidate \
        --cpu 36
    
  2. Correct some wrong entity in prior

    python preprocess_priormapping.py \
        --wiki_preprocess preprocess_2021candidate \
        --cpu 36
    
  3. Generate candidates

    python preprocess_prior2candidate.py \
        --wiki_preprocess preprocess_2021candidate
    

Preparing Entity Pretraining

  1. Generate vocabulary

    python preprocess_vocab.py \
        --wiki_preprocess preprocess_2014candidate \
        --dataroot data_2014_2021dca
    
  2. Generate entity word co-occurrence

    python preprocess_entitypage.py \
        --wiki_preprocess preprocess_2014candidate \
        --dataroot data_2014_2014dca \
        --cpu 36
    
  3. Generate inlinks from wiki dump

    python preprocess_inlinks.py \
        --wiki_preprocess preprocess_2014candidate \
        --dataroot data_2014_2014dca \
        --cpu 36
    

Change both --wiki_preprocess and --dataroot to preprocess_2021candidate for entity pre-training on new candidates

Entity Pretraining

  • Train on old candidate

    python train_entity_embedding.py \
        --dataroot ./data_2014_2021dca \
        --wiki_preprocess ./preprocess_2014candidate \
        --logdir ./logs/2014_2021entitypretrain
    
  • Train on new candidate

    python train_entity_embedding.py \
        --dataroot ./data_2014_2021dca \
        --wiki_preprocess ./preprocess_2021candidate \
        --logdir ./logs/2021_2021entitypretrain
    

DCA Training

  • Supervised Learning
    python main.py --mode train --method SL --logdir ./logs/SL
    
  • Supervised Learning
    python main.py --mode train --method RL --logdir ./logs/RL
    

Notes

  • Gold recall after preranking
    Name: aida-train, #batch: 953, #mention: 18258, recall: 1.000000
    Name: aida-A    , #batch: 218, #mention:  4791, recall: 0.977249
    Name: aida-B    , #batch: 232, #mention:  4485, recall: 0.986622
    Name: msnbc     , #batch:  20, #mention:   656, recall: 0.984756
    Name: aquaint   , #batch:  50, #mention:   727, recall: 0.940853
    Name: ace2004   , #batch:  35, #mention:   257, recall: 0.914397
    Name: clueweb   , #batch: 320, #mention: 11154, recall: 0.919042
    Name: wikipedia , #batch: 319, #mention:  6821, recall: 0.932708
    

Citation

If you find the implementation useful, please cite the original paper: Learning Dynamic Context Augmentation for Global Entity Linking.

@inproceedings{yang2019learning,
  title={Learning Dynamic Context Augmentation for Global Entity Linking},
  author={Yang, Xiyuan and Gu, Xiaotao and Lin, Sheng and Tang, Siliang and Zhuang, Yueting and Wu, Fei and Chen, Zhigang and Hu, Guoping and Ren, Xiang},
  booktitle = {Proceedings of EMNLP-IJCNLP},
  year={2019}
}

About

This is a refactored code base of Learning Dynamic Context Augmentation (DCA) for Global Entity Linking.

https://arxiv.org/abs/1909.02117


Languages

Language:Python 100.0%