Score at final state might be calculated wrong

Question

Score at final state might be calculated wrong

AlbertoVillanueva opened this issue 3 years ago · comments

Alberto Villanueva Nieto commented 3 years ago

The environment score seems to be 0 in the last state when it loses the game by running out of lives, this causes the reward to be -score if it loses, when it wins this doesn't seem to happen and the score stays the value of the sum of the fireworks list.

It might be an error as the score in the last state should be the sum of the total fireworks placed regardless of if it wins or not.

I've attached the code I used to test this and one of the outputs

Alberto Villanueva Nieto · Answer 1 · Sat Jun 19 2021 21:58:37 GMT+0800 (China Standard Time)

I realized that I failed to read the footnote on page 6 of the Hanabi Challenge paper:

Note that while scoring zero when the team runs out of lives agrees with the game’s publishedrules, much of the prior research on Hanabi (discussed in Section 5.1) scores this as the number of cardssuccessfully played prior to failure.

It is working as intended.