New metric: mAP@R

Question

New metric: mAP@R

sudalvxin opened this issue 4 years ago · comments

Will you add the metric mAP@R( used in A Metric Learning Reality Check) into this repository?

Karsten Roth · Answer 1 · Thu Jul 16 2020 17:58:45 GMT+0800 (China Standard Time)

Hi there!

This repo already contains a mAP@R implementation, however we do the mean over class-wise average precisions@k. I haven't checked, but I think their implementation does not do the class-wise averaging. If that is the case, I'll include this as well :)

Zhanxuan Hu · Answer 2 · Thu Jul 16 2020 18:09:36 GMT+0800 (China Standard Time)

The mAP used in this repo may be different with the mAP used in "A Metric Learning Reality Check". The latter considers the ranking of the correct retrievals.

Zhanxuan Hu · Answer 3 · Thu Jul 16 2020 18:14:54 GMT+0800 (China Standard Time)

https://web.stanford.edu/class/cs276/handouts/EvaluationNew-handout-1-per.pdf

Hi there!

This repo already contains a mAP@R implementation, however we do the mean over class-wise average precisions@k. I haven't checked, but I think their implementation does not do the class-wise averaging. If that is the case, I'll include this as well :)

https://web.stanford.edu/class/cs276/handouts/EvaluationNew-handout-1-per.pdf ========= this pdf may be useful.

Karsten Roth · Answer 4 · Thu Jul 16 2020 18:44:15 GMT+0800 (China Standard Time)

Looking at the implementations, I think the main difference is the per-class averaging. I'll check their mAP variant experimentally and include it here in the next days :)

Zhanxuan Hu · Answer 5 · Fri Jul 17 2020 08:58:09 GMT+0800 (China Standard Time)

Looking at the implementations, I think the main difference is the per-class averaging. I'll check their mAP variant experimentally and include it here in the next days :)

Thanks for your contribution to DML.

Zhanxuan Hu · Answer 6 · Sat Jul 18 2020 08:59:22 GMT+0800 (China Standard Time)

Hi there!

This repo already contains a mAP@R implementation, however we do the mean over class-wise average precisions@k. I haven't checked, but I think their implementation does not do the class-wise averaging. If that is the case, I'll include this as well :)

The 'mAP@R' you mentioned is '.../metrics/mAP.py'. But, I think that mAP@R consider only the 'R' neighbors of each query.

Karsten Roth · Answer 7 · Sat Jul 18 2020 15:37:24 GMT+0800 (China Standard Time)

What the mAP function does is for each class it takes as many samples into account for the average precision as there are samples available for the specific class (as we have also noted in the paper :)).

Looking over Kevin's implementation, this also seems to be what he is doing - and if you go to the supplementary of the paper where we also list mAP values for every run, the rough values also coincide with those listed in his paper :).

Karsten Roth · Answer 8 · Sat Jul 18 2020 15:43:35 GMT+0800 (China Standard Time)

The main difference is that we clip the mAP at the maximum number of samples a class can have - I'll offer a second mAP option that does not have this property.

Karsten Roth · Answer 9 · Sat Jul 18 2020 17:47:34 GMT+0800 (China Standard Time)

Ok so I have included the standard mAP-formulation as metrics/mAP.py and moved the class-limited current version to metrics/mAP_c.py. Both are heavily correlated and insights are transferable - either way, during default training, both will be tracked :) . Hope that helps!

Karsten Roth · Answer 10 · Sun Jul 19 2020 00:30:51 GMT+0800 (China Standard Time)

Important note: I have included metrics/mAP_1000.py, which uses mAP@1000, as higher k values are not compatible with faiss-gpu (and also very costly for larger datasets s.a SOP).

Generally, there really is no reason to go beyond k=1000. Even for SOP, people only measure recall@1000, which already is quite debatable as a choice of metric :).

But if needed, you can include mAP in parameters.py/--evaluation_metrics.

Karsten Roth · Answer 11 · Sun Jul 19 2020 00:38:11 GMT+0800 (China Standard Time)

Finally, I have included mAP_lim.py (which is also measured by default), which is mAP limited to k=1023 (so essentially mAP@1023 for all benchmark datasets). This is what is used in pytorch-metric-learning :).

Zhanxuan Hu · Answer 12 · Sun Jul 19 2020 09:20:46 GMT+0800 (China Standard Time)

Thank you very much!

Karsten Roth · Answer 13 · Sun Jul 19 2020 16:05:22 GMT+0800 (China Standard Time)

I'll close this now, feel free to reopen it if any other question occurs :)