rali-udem / WiRe57

A Fine-Grained Benchmark for Open Information Extraction

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

WiRe57

This repository contains resources for the Open Information Extraction benchmark WiRe57.

It acts as a companion to the article

William LĂ©chelle, Fabrizio Gotti, Philippe Langlais (2019), WiRe57 : A Fine-Grained Benchmark for Open Information Extraction, LAW XIII 2019 : The 13th Linguistic Annotation Workshop

Introduction

Open Information Extraction (OIE) systems, starting with TextRunner, seek to extract all relational tuples expressed in text, without being bound to an anticipated list of predicates. Such systems have been used recently for relation extraction, question-answering, and for building domain-targeted knowledge bases, among others.

Subsequent extractors (ReVerb, Ollie, ClausIE, Stanford Open IE, OpenIE4, MinIE) have sought to improve yield and precision. Despite this, the task definition is underspecified, and, to the best of our knowledge, there is no gold standard.

In order to mitigate this problem, we built a reference for the task of Open Information Extraction, on five documents. We tentatively resolve a number of issues that arise, including inference and granularity. We seek to better pinpoint the requirements for the task. We produce our annotation guidelines specifying what is correct to extract and what is not. In turn, we use this reference to score existing Open IE systems. We address the non-trivial problem of evaluating the extractions produced by systems against the reference tuples, and share our evaluation script. Among seven compared extractors, we find the MinIE system to perform best.

Available resources

The annotation guidelines define how to annotate English text with triples.

The data directory contains the following files:

  • README.md: Specifies the source of the annotated documents.
  • WiRe57_343-manual-oie.json: The WiRe57 manual reference
  • WiRe57_extractions_by_ollie_clausie_openie_stanford_minie_reverb_props-export.json: Extractions by state-of-the-art OIE systems on WiRe documents.

The code directory contains the evaluation script used in the paper.

About

A Fine-Grained Benchmark for Open Information Extraction


Languages

Language:Python 100.0%