avg & best the same
DevZiegler opened this issue · comments
DevZiegler commented
Hi, can it be that apply_avg and apply_best always output the same no matter what was selected?
Evaluation with Avg
rouge-1: P: 75.71 R: 75.00 F1: 74.72
rouge-2: P: 66.67 R: 62.50 F1: 63.93
rouge-3: P: 58.33 R: 50.00 F1: 53.33
rouge-4: P: 50.00 R: 37.50 F1: 41.67
rouge-l: P: 78.85 R: 78.61 F1: 78.26
rouge-w: P: 74.65 R: 53.28 F1: 61.69
Evaluation with Best
rouge-1: P: 75.71 R: 75.00 F1: 74.72
rouge-2: P: 66.67 R: 62.50 F1: 63.93
rouge-3: P: 58.33 R: 50.00 F1: 53.33
rouge-4: P: 50.00 R: 37.50 F1: 41.67
rouge-l: P: 78.85 R: 78.61 F1: 78.26
rouge-w: P: 74.65 R: 53.28 F1: 61.69
Evaluation with Individual
Hypothesis #0 & Reference #0:
rouge-1: P: 80.00 R: 80.00 F1: 80.00
Hypothesis #1 & Reference #0:
rouge-1: P: 42.86 R: 60.00 F1: 50.00
Hypothesis #2 & Reference #0:
rouge-1: P: 100.00 R: 80.00 F1: 88.89
Hypothesis #3 & Reference #0:
rouge-1: P: 80.00 R: 80.00 F1: 80.00
Hypothesis #0 & Reference #0:
rouge-2: P: 75.00 R: 75.00 F1: 75.00
Hypothesis #1 & Reference #0:
rouge-2: P: 16.67 R: 25.00 F1: 20.00
Hypothesis #2 & Reference #0:
rouge-2: P: 100.00 R: 75.00 F1: 85.71
Hypothesis #3 & Reference #0:
rouge-2: P: 75.00 R: 75.00 F1: 75.00
Hypothesis #0 & Reference #0:
rouge-3: P: 66.67 R: 66.67 F1: 66.67
Hypothesis #1 & Reference #0:
rouge-3: P: 0.00 R: 0.00 F1: 0.00
Hypothesis #2 & Reference #0:
rouge-3: P: 100.00 R: 66.67 F1: 80.00
Hypothesis #3 & Reference #0:
rouge-3: P: 66.67 R: 66.67 F1: 66.67
Hypothesis #0 & Reference #0:
rouge-4: P: 50.00 R: 50.00 F1: 50.00
Hypothesis #1 & Reference #0:
rouge-4: P: 0.00 R: 0.00 F1: 0.00
Hypothesis #2 & Reference #0:
rouge-4: P: 100.00 R: 50.00 F1: 66.67
Hypothesis #3 & Reference #0:
rouge-4: P: 50.00 R: 50.00 F1: 50.00
Hypothesis #0 & Reference #0:
rouge-l: P: 83.03 R: 83.03 F1: 83.03
Hypothesis #1 & Reference #0:
rouge-l: P: 49.36 R: 65.33 F1: 56.23
Hypothesis #2 & Reference #0:
rouge-l: P: 100.00 R: 83.03 F1: 90.73
Hypothesis #3 & Reference #0:
rouge-l: P: 83.03 R: 83.03 F1: 83.03
Hypothesis #0 & Reference #0:
rouge-w: P: 80.00 R: 57.98 F1: 67.23
Hypothesis #1 & Reference #0:
rouge-w: P: 38.61 R: 39.18 F1: 38.89
Hypothesis #2 & Reference #0:
rouge-w: P: 100.00 R: 57.98 F1: 73.40
Hypothesis #3 & Reference #0:
rouge-w: P: 80.00 R: 57.98 F1: 67.23
Markus commented
Hi,
you only use one reference summary. That means that the average score and the best score is simply the score wrt. the single reference summary. You will only get different results here when you use multiple different reference summaries.