dbpedia / fact-extractor

Fact Extraction from Wikipedia Text

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The subject should be the article URI from which the fact was extracted

marfox opened this issue · comments

Currently, we don't keep track of the sentences provenance (i.e., the Wikipedia article).
A sample extracted fact now looks like:

resource:SENTENCE0000 fact:Attività fact:Attività0000 .

We only keep track of the sentence ID from which the fact was extracted.
Instead, we should link to the Wikipedia article containing that sentence:

resource:${WIKIPEDIA_ARTICLE} fact:Attività fact:Attività0000 .

The WikiExtractor remembers the URL of the article.
Such information should be passed to this script function, which also produces the dataset from the unsupervised method.

This works supposing that file names are used as sentence ID in the format wikiID.uniqID and that there exists a mapping from wiki ID to the title of the article. This is completely handled in this new file which creates file names according to this rule and the mapping itself