farishta4898 / CMPUT-656_Project

This repo outlines a research project inspired by the foundational work of Esquivel et al. (2017) on long-tail entities in news. Our project aims to leverage modern NLP tools, specifically SpaCy for NER and BLINK for EL, to address the challenges associated with identifying and linking long-tail entities in large news corpora.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CMPUT 656 Project: Revisiting Long-Tail Entities in News: A Modern Tool Approach

Refer to the requirements.txt file to view the necessary dependencies.
  1. To begin the project, execute the Named Entity Recognition tool on the dataset. Refer to the Spacy_NER file for the codebase. Comments within the code file provide guidance on which blocks of code to execute. Results are captured in the output section of the notebook since it was used for the analysis

  2. Next, execute the BLINK_Entity_Linking file to apply the entity linker, BLINK, to the dataset. Ensure that you have installed the necessary dependencies and refer to the comments within the code for instructions on execution. Results are captured in the output section of the notebook since it was used for the analysis

We do not provide the Signal -1M dataset or the subset used in this study within this repository as we do not have the necessary licensing for sharing

About

This repo outlines a research project inspired by the foundational work of Esquivel et al. (2017) on long-tail entities in news. Our project aims to leverage modern NLP tools, specifically SpaCy for NER and BLINK for EL, to address the challenges associated with identifying and linking long-tail entities in large news corpora.

License:GNU General Public License v3.0


Languages

Language:Jupyter Notebook 100.0%