[Feature Request] stddev statistic
jamm1985 opened this issue · comments
Andrey Stepnov commented
It would be a good to additionally calculate the sample variance within the average of the metrics.
Like ndcg@50: 11, stddev: 2.8
Elias Bassani commented
Hi! Yes, I should add it.
In the mean time, you can compute it as follows:
from ranx import Qrels, Run, evaluate
import numpy as np
qrels_dict = { "q_1": { "d_12": 5, "d_25": 3 },
"q_2": { "d_11": 6, "d_22": 1 } }
run_dict = { "q_1": { "d_12": 0.9, "d_23": 0.8, "d_25": 0.7,
"d_36": 0.6, "d_32": 0.5, "d_35": 0.4 },
"q_2": { "d_12": 0.9, "d_11": 0.8, "d_25": 0.7,
"d_36": 0.6, "d_22": 0.5, "d_35": 0.4 } }
qrels = Qrels(qrels_dict)
run = Run(run_dict)
evaluate(qrels, run, ["map@5", "mrr"])
print(np.std(list(run.scores["map@5"].values())))
print(np.std(list(run.scores["mrr"].values())))
Elias Bassani commented
Added support for standard deviation in v0.3.19
.
Example:
from ranx import Qrels, Run, evaluate
qrels_dict = { "q_1": { "d_12": 5, "d_25": 3 },
"q_2": { "d_11": 6, "d_22": 1 } }
run_dict = { "q_1": { "d_12": 0.9, "d_23": 0.8, "d_25": 0.7,
"d_36": 0.6, "d_32": 0.5, "d_35": 0.4 },
"q_2": { "d_12": 0.9, "d_11": 0.8, "d_25": 0.7,
"d_36": 0.6, "d_22": 0.5, "d_35": 0.4 } }
qrels = Qrels(qrels_dict)
run = Run(run_dict)
evaluate(qrels, run, ["map@5", "mrr"], return_std=True)
Output:
{'map@5': {'mean': 0.6416666666666666, 'std': 0.19166666666666662},
'mrr': {'mean': 0.75, 'std': 0.25}}
Metrics standard deviations can later be accessed as follows:
run.std_scores
Output:
{'map@5': 0.19166666666666662, 'mrr': 0.25}
Please, consider giving ranx
a star if you haven't yet. :)
Andrey Stepnov commented
@AmenRa thank you!