Why beam search results are sorted based on loss, not on ROUGE scores
astariul opened this issue · comments
Currently results of the beam search are sorted based on the loss :
sorted(results, key=lambda h: -h.avg_log_prob)[:beam_size]
(here)
And then rouge score taken is the score corresponding to the first result :
r1 += scores[0]['1_f']
(here)
Why such a decision ? Aren't we looking for a better ROUGE score (we don't really care of the loss at testing time, do we ?) ?
I was confused. Sorry for the mess.
Note for myself later :
Of course you can't do that ! It's testing, which means we are not supposed to see the golden abstract. Prediction should be done without seeing the golden labels, therefore without access to the ROUGE score.