EstQA contains multiple ids for the same context document

Question

EstQA contains multiple ids for the same context document

x-tabdeveloping opened this issue 2 months ago · comments

Something I noticed is that the way I uploaded EstQA to HuggingFace and then used it in the benchmark is not suitable for a retrieval task, as multiple contexts can belong to the same question and this is not accounted for.

Potential fixes:

Just use the answer as the retrieved passage instead of the context.
Fix the dataset on HuggingFace with multiple tables.

Kenneth Enevoldsen · Answer 1 · Thu Apr 18 2024 17:34:15 GMT+0800 (China Standard Time)

Since you can have multiple positive retrievals shouldn't you simply have multiple positive pairs?

Márton Kardos · Answer 2 · Thu Apr 18 2024 17:38:06 GMT+0800 (China Standard Time)

The problem is that since the same context gets multiple IDs even when the model retrieves the correct context for a question it might get detected as false positive. But I'm on it, submitting a PR soon.