CrowdTruth / FrameDisambiguation

Crowdsourced data for semantic frame disambiguation from sentences.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Frame Disambiguation with CrowdTruth

DOI

This repository contains a ground truth corpus for semantic frame disambiguation, acquired with crowdsourcing and processed with CrowdTruth metrics that capture ambiguity in annotations by measuring inter-annotator disagreement.

The dataset contains annotations for over 9000 sentence-word pairs from the FrameNet corpus v.1.7, with each sentence-word pair annotated for frame disambiguation by 15 workers. The crowdsourced data was collected from Amazon Mechanical Turk.

The corpus has been referenced in the following papers:

To replicate the data processing from the paper, use the Jupyter Notebook file CrowdTruth metrics.ipynb. It requires the installation of the CrowdTruth metrics Python package (v >= 2.0).

The data aggregated with CrowdTruth metrics is available in folder data/output/

The raw crowdsourcing data is available in folder data/input/

If you find this data useful in your research, please consider citing:

@inproceedings{dumitrache2018frames,
  Author = {Anca Dumitrache and Lora Aroyo and Chris Welty},
  Title = {A Crowdsourced Frame Disambiguation Corpus with Ambiguity},
  Booktitle = {Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)},
  Year = {2019}
}

About

Crowdsourced data for semantic frame disambiguation from sentences.


Languages

Language:Jupyter Notebook 72.7%Language:HTML 27.3%