google-deepmind / hanabi-learning-environment

hanabi_learning_environment is a research platform for Hanabi experiments.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Score at final state might be calculated wrong

AlbertoVillanueva opened this issue · comments

The environment score seems to be 0 in the last state when it loses the game by running out of lives, this causes the reward to be -score if it loses, when it wins this doesn't seem to happen and the score stays the value of the sum of the fireworks list.

It might be an error as the score in the last state should be the sum of the total fireworks placed regardless of if it wins or not.

I've attached the code I used to test this and one of the outputs

I realized that I failed to read the footnote on page 6 of the Hanabi Challenge paper:

  1. Note that while scoring zero when the team runs out of lives agrees with the game’s publishedrules, much of the prior research on Hanabi (discussed in Section 5.1) scores this as the number of cardssuccessfully played prior to failure.

It is working as intended.