how to botain the original dev Subset in a tsv file?

Question

how to botain the original dev Subset in a tsv file?

XY2323819551 opened this issue 3 years ago · comments

Hi, I am doing the "Experiments on MS MARCO Passage Retrieval - Dev Subset - with GPU"，I want to get the original dev Subset in a tsv file(containing 105 queries)，just like 《 Passage Re-ranking with BERT》provide us with "top1000.dev.tsv" and others requirements. In 《 Passage Re-ranking with BERT》，I can use convert scripts convert tsv file to tfrecord format，but it is too big, I just want to convert 105 queries，not almost 6800 queries, but how to get that Dev Subset?

Ronak · Answer 1 · Thu May 27 2021 14:46:13 GMT+0800 (China Standard Time)

The data prep section has details on how to get this subset, you are downloading it. It should be in the msmarco_ans_small folder after extracting. I'll attempt to add some clarity, the dev subset we use is nothing official, it was just a randomly curated subset of MS MARCO Passage that most users can quickly run these systems on instead of running on all the 6xxx queries.

xiaoyu · Answer 2 · Thu May 27 2021 15:21:37 GMT+0800 (China Standard Time)

The data prep section has details on how to get this subset, you are downloading it. It should be in the msmarco_ans_small folder after extracting. I'll attempt to add some clarity, the dev subset we use is nothing official, it was just a randomly curated subset of MS MARCO Passage that most users can quickly run these systems on instead of running on all the 6xxx queries.

OK, thanks for your reply!