ELDEN: Improved Entity Linking using Densified Knowledge Graphs

This software is the implementation of the paper "ELDEN: Improved Entity Linking using Densified Knowledge Graphs" to be presented at 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT 2018) at New Orleans, Louisiana, June 1 to June 6, 2018.

Requirements

Code is written in Python (2.7), Torch and Lua (Luajit)

Using the pre-trained word2vec vectors from gensim will require downloading it from https://radimrehurek.com/gensim/models/word2vec.html

Co-occurance matrix and other datafiles can be downloaded at https://www.dropbox.com/s/wqduqde7pv8cr76/ELDEN_Corpus.tar.gz?dl=0

Running the models

This package contains the four steps (folders A to D) of implementation, followed by Evaluation. We suggest running the system in this order.

A. Corpus :

Wikipedia (clean as specified in paper)
Web Corpus = trainingEntities.py, processMultipleEntities.py, WebScraping.py

B. Dataset :

TAC2010 = TACforNED
CoNLL = https://github.com/masha-p/PPRforNED Please cite the respective papers when using these datasets.

C. Preprocess:

Create entity co-location index. python2.7 pmi_index.py base_co.npy/None vocab.pickle output_file file_scraped_from_web
Start PMI Server. python pmi_service.py
Train entity embeddings. th> main.lua <<word2vec.lua>>
Start Embedding Distance Servers. th> EDServer.lua

D. Entity Linker:

Create train and test dataset python createTrainData.py
Run Entity Linker python classify.py

E. Evaluation :

Head entities versus tail entities statistics python TailEntities.py

Kindly cite the paper if you are using the software

priyaradhakrishnan0 / ELDEN

ELDEN: Improved Entity Linking using Densified Knowledge Graphs

Requirements

Running the models

About

Languages