How are you handling duplicate entries for the corpus and qrels?

Question

How are you handling duplicate entries for the corpus and qrels?

steven-channel opened this issue a month ago · comments

While running some evaluation on the MIRACL-Korean benchmark, I'm noticing that the qrels and corpus files contain duplicate IDs which is causing some errors. Is this being handled somewhere?

steven-channel · Answer 1 · Wed May 29 2024 18:05:12 GMT+0800 (China Standard Time)

Seems like a similar issue was in Anserini?

castorini/anserini#720