The subject should be the article URI from which the fact was extracted

Question

The subject should be the article URI from which the fact was extracted

marfox opened this issue 9 years ago · comments

Currently, we don't keep track of the sentences provenance (i.e., the Wikipedia article).
A sample extracted fact now looks like:

resource:SENTENCE0000 fact:Attività fact:Attività0000 .

We only keep track of the sentence ID from which the fact was extracted.
Instead, we should link to the Wikipedia article containing that sentence:

resource:${WIKIPEDIA_ARTICLE} fact:Attività fact:Attività0000 .

Marco Fossati · Answer 1 · Tue Jun 30 2015 17:35:59 GMT+0800 (China Standard Time)

The WikiExtractor remembers the URL of the article.
Such information should be passed to this script function, which also produces the dataset from the unsupervised method.

Emilio Dorigatti · Answer 2 · Sat Jul 04 2015 01:01:57 GMT+0800 (China Standard Time)

This works supposing that file names are used as sentence ID in the format wikiID.uniqID and that there exists a mapping from wiki ID to the title of the article. This is completely handled in this new file which creates file names according to this rule and the mapping itself