avg & best the same

Question

avg & best the same

DevZiegler opened this issue 5 years ago · comments

Hi, can it be that apply_avg and apply_best always output the same no matter what was selected?

Evaluation with Avg
	rouge-1:	P: 75.71	R: 75.00	F1: 74.72
	rouge-2:	P: 66.67	R: 62.50	F1: 63.93
	rouge-3:	P: 58.33	R: 50.00	F1: 53.33
	rouge-4:	P: 50.00	R: 37.50	F1: 41.67
	rouge-l:	P: 78.85	R: 78.61	F1: 78.26
	rouge-w:	P: 74.65	R: 53.28	F1: 61.69

Evaluation with Best
	rouge-1:	P: 75.71	R: 75.00	F1: 74.72
	rouge-2:	P: 66.67	R: 62.50	F1: 63.93
	rouge-3:	P: 58.33	R: 50.00	F1: 53.33
	rouge-4:	P: 50.00	R: 37.50	F1: 41.67
	rouge-l:	P: 78.85	R: 78.61	F1: 78.26
	rouge-w:	P: 74.65	R: 53.28	F1: 61.69

Evaluation with Individual
	Hypothesis #0 & Reference #0: 
		rouge-1:	P: 80.00	R: 80.00	F1: 80.00
	Hypothesis #1 & Reference #0: 
		rouge-1:	P: 42.86	R: 60.00	F1: 50.00
	Hypothesis #2 & Reference #0: 
		rouge-1:	P: 100.00	R: 80.00	F1: 88.89
	Hypothesis #3 & Reference #0: 
		rouge-1:	P: 80.00	R: 80.00	F1: 80.00

	Hypothesis #0 & Reference #0: 
		rouge-2:	P: 75.00	R: 75.00	F1: 75.00
	Hypothesis #1 & Reference #0: 
		rouge-2:	P: 16.67	R: 25.00	F1: 20.00
	Hypothesis #2 & Reference #0: 
		rouge-2:	P: 100.00	R: 75.00	F1: 85.71
	Hypothesis #3 & Reference #0: 
		rouge-2:	P: 75.00	R: 75.00	F1: 75.00

	Hypothesis #0 & Reference #0: 
		rouge-3:	P: 66.67	R: 66.67	F1: 66.67
	Hypothesis #1 & Reference #0: 
		rouge-3:	P:  0.00	R:  0.00	F1:  0.00
	Hypothesis #2 & Reference #0: 
		rouge-3:	P: 100.00	R: 66.67	F1: 80.00
	Hypothesis #3 & Reference #0: 
		rouge-3:	P: 66.67	R: 66.67	F1: 66.67

	Hypothesis #0 & Reference #0: 
		rouge-4:	P: 50.00	R: 50.00	F1: 50.00
	Hypothesis #1 & Reference #0: 
		rouge-4:	P:  0.00	R:  0.00	F1:  0.00
	Hypothesis #2 & Reference #0: 
		rouge-4:	P: 100.00	R: 50.00	F1: 66.67
	Hypothesis #3 & Reference #0: 
		rouge-4:	P: 50.00	R: 50.00	F1: 50.00

	Hypothesis #0 & Reference #0: 
		rouge-l:	P: 83.03	R: 83.03	F1: 83.03
	Hypothesis #1 & Reference #0: 
		rouge-l:	P: 49.36	R: 65.33	F1: 56.23
	Hypothesis #2 & Reference #0: 
		rouge-l:	P: 100.00	R: 83.03	F1: 90.73
	Hypothesis #3 & Reference #0: 
		rouge-l:	P: 83.03	R: 83.03	F1: 83.03

	Hypothesis #0 & Reference #0: 
		rouge-w:	P: 80.00	R: 57.98	F1: 67.23
	Hypothesis #1 & Reference #0: 
		rouge-w:	P: 38.61	R: 39.18	F1: 38.89
	Hypothesis #2 & Reference #0: 
		rouge-w:	P: 100.00	R: 57.98	F1: 73.40
	Hypothesis #3 & Reference #0: 
		rouge-w:	P: 80.00	R: 57.98	F1: 67.23

Markus · Answer 1 · Fri Nov 15 2019 22:28:52 GMT+0800 (China Standard Time)

Hi,

you only use one reference summary. That means that the average score and the best score is simply the score wrt. the single reference summary. You will only get different results here when you use multiple different reference summaries.