Rouge scores mismatch

Question

Rouge scores mismatch

poojithansl opened this issue 6 years ago · comments

The work you have put in is quite appealing.

We have used the model provided here under the section "Train with pointer generation + coverage loss enabled " to decode.
The ROUGE scores we obtained slightly vary from that posted here.

Our ROUGE scores
ROUGE-1:
rouge_1_f_score: 0.3680 with confidence interval (0.3658, 0.3701)
rouge_1_recall: 0.4234 with confidence interval (0.4208, 0.4261)
rouge_1_precision: 0.3471 with confidence interval (0.3446, 0.3496)

ROUGE-2:
rouge_2_f_score: 0.1485 with confidence interval (0.1464, 0.1507)
rouge_2_recall: 0.1706 with confidence interval (0.1682, 0.1731)
rouge_2_precision: 0.1407 with confidence interval (0.1385, 0.1429)

ROUGE-l:
rouge_l_f_score: 0.3327 with confidence interval (0.3306, 0.3349)
rouge_l_recall: 0.3827 with confidence interval (0.3802, 0.3853)
rouge_l_precision: 0.3139 with confidence interval (0.3116, 0.3164)

To get the expected scores in the README what could be the config parameters?

Atul Kumar · Answer 1 · Tue Jul 31 2018 15:22:20 GMT+0800 (China Standard Time)

I have uploaded my test output here

I would try to reproduce this problem when I have time.