dig-team / vagueness_project

Vagueness project (paper published at AKBC 2021).

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The Vagueness of Vagueness in Noun Phrases

This project is part of the NoRDF project and is a joint work between Pierre-Henri Paris, Syrine El Aoud and Fabian Suchanek (Télécom Paris, DIG team).

What is about?

Natural language text has a great potential to feed knowledge bases. However, natural language is not always precise - and sometimes intentionally so. In this position paper, we study vagueness in noun phrases. We manually analyze the frequency of vague noun phrases in a Wikipedia corpus, and find that 1/4 of noun phrases exhibit some form of vagueness. We report on their nature and propose a categorization. We then conduct a literature review and present different definitions of vagueness, and different existing methods to deal with the detection and modeling of vagueness. We find that, despite its frequency, vagueness has not yet be addressed in its entirety.

Content

All data are located in the data directory:

  • articles: this folder contains the 30 original Wikipedia articles.
  • annotation guideline.md: this file describes the process we followed to annotate the Wikipedia articles.
  • annotations: this folder contains the annotations of articles from the articles folder.
  • stats: this folder contains the stats we computed. You can check both the annotations and the code we used to compute those stats.

The analysis.py script can be executed to compute the stats corresponding to our annotations.

To cite this work

Pierre-Henri Paris, Syrine El Aoud and Fabian M. Suchanek. The Vagueness of Vagueness in Noun Phrases. In AKBC 2021.

Acknowledgments

This work was partially funded by the grant ANR-20-CHIA-0012-01 ("NoRDF").

About

Vagueness project (paper published at AKBC 2021).


Languages

Language:HTML 99.9%Language:Python 0.1%