castorini / pyserini

Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.

Home Page:http://pyserini.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How are you handling duplicate entries for the corpus and qrels?

steven-channel opened this issue · comments

While running some evaluation on the MIRACL-Korean benchmark, I'm noticing that the qrels and corpus files contain duplicate IDs which is causing some errors. Is this being handled somewhere?

Seems like a similar issue was in Anserini?

castorini/anserini#720