AmenRa / ranx

⚡️A Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion 🐍

Home Page:https://amenra.github.io/ranx

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Why in Recall@k you divide on len(relevant), but not min(len(relevant), k)

PososikTeam opened this issue · comments

The question about Recall@k arose when I looked at the best scores R@1 of Stanford Online Products dataset in paperswithcode https://paperswithcode.com/sota/metric-learning-on-stanford-online-products-1. This benchmark use R@1 metric to measure best models and approach in retrieval task in SOP dataset. Sop dataset has 4.3 images per class (query), so maximum score R@1 with ranx formula would be 1 / 4.3.

In benchmark SOP and many others benchmark they use divide coefficient = min(len(relevant), k).

What do you think about overwrite this coefficient? Why papers with code write R@1 but it's actually not R@1 it's HitRate@1?

Hi, that's an interesting question!

ranx is built to reproduce trec_eval's scores, as it is the standard evaluation library used in Information Retrieval research.
ranx's recall@k works as trec_eval's recall@k.

I was not aware of what you brought to my attention. However, it seems to be known.
From PyTerrier's documentation section about Recall@k:

Recall@k (R@k). The fraction of relevant documents for a query that have been retrieved by rank k.

NOTE: Some tasks define Recall@k as whether any relevant documents are found in the top k results.
This software follows the TREC convention and refers to that measure as Success@k.

trec_eval's Success measure is called Hit Rate in ranx.

Let me know if this answers your question, and consider giving ranx a star if you like it!

Best,

Elias

Yes you answered my question, thank you very much.