hotpotqa / hotpot

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Not all supporting_facts titles in context titles?

Spongeorge opened this issue · comments

Hi, I'm trying to understand how to access the "gold" paragraphs in the dataset and having difficulties.

Its to my understanding that the unique values of supporting_facts['titles'] represent the titles of the gold paragraphs, e.g. the first entry of the fullwiki validation split is:
{ "title": [ "Scott Derrickson", "Ed Wood" ], "sent_id": [ 0, 0 ] }

But the titles of the paragraphs in the context column are:
[ "Adam Collis", "Ed Wood (film)", "Tyler Bates", "Doctor Strange (2016 film)", "Hellraiser: Inferno", "Sinister (film)", "Deliver Us from Evil (2014 film)", "Woodson, Arkansas", "Conrad Brooks", "The Exorcism of Emily Rose" ]

"Ed Wood (film)" is a paragraph title, but "Ed Wood" is not, so how are we meant to map between the two?

In other cases nothing even resembling the supporting fact title is present in the paragraph titles, and only about 60% the supporting paragraphs are able to be accessed using the title.

Figured out from reading the paper, fullwiki split doesn't contain gold paragraphs, they are only in the distractor split. Oops!