Why in Recall@k you divide on len(relevant), but not min(len(relevant), k)
PososikTeam opened this issue · comments
The question about Recall@k arose when I looked at the best scores R@1 of Stanford Online Products dataset in paperswithcode https://paperswithcode.com/sota/metric-learning-on-stanford-online-products-1. This benchmark use R@1 metric to measure best models and approach in retrieval task in SOP dataset. Sop dataset has 4.3 images per class (query), so maximum score R@1 with ranx formula would be 1 / 4.3
.
In benchmark SOP and many others benchmark they use divide coefficient = min(len(relevant), k)
.
What do you think about overwrite this coefficient? Why papers with code write R@1 but it's actually not R@1 it's HitRate@1?
Hi, that's an interesting question!
ranx
is built to reproduce trec_eval
's scores, as it is the standard evaluation library used in Information Retrieval research.
ranx
's recall@k
works as trec_eval
's recall@k
.
I was not aware of what you brought to my attention. However, it seems to be known.
From PyTerrier
's documentation section about Recall@k:
Recall@k (R@k). The fraction of relevant documents for a query that have been retrieved by rank k.
NOTE: Some tasks define Recall@k as whether any relevant documents are found in the top k results.
This software follows the TREC convention and refers to that measure as Success@k.
trec_eval
's Success
measure is called Hit Rate
in ranx
.
Let me know if this answers your question, and consider giving ranx
a star if you like it!
Best,
Elias
Yes you answered my question, thank you very much.