ucinlp / autoprompt

AutoPrompt: Automatic Prompt Construction for Masked Language Models.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Additional Wikidata triples

JanKalo opened this issue · comments

Hi there,

we are interested in working with your training data set for fact extraction.
In the paper you mention that TREX does not contain 1000 triples for all properties, so you add extra triples from Wikidata. However, I cannot find these triples in the .jsonl files. Some of the properties actually don't have 1000 triples.
Am I missing something?
It would be nice, if you could clarify how I can find these additional triples or whether you did not use them after all in the training.
Bests,
Jan

Hi Jan, thanks for your interest in our paper! The extra triples from Wikidata are in the train.jsonl and val.jsonl files for each relation. Unfortunately, some of the relations had very few triples in general so relations like P1376 and P108 will have fewer than 1000 data points. We tried our best to collect at most 1000 triples for the rest of the relations though.

Oh, you are right. I was somehow expecting more relations to have 1000 data points.
Thanks.