google-deepmind / hanabi-learning-environment

hanabi_learning_environment is a research platform for Hanabi experiments.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

observation stacking?

mwalton opened this issue · comments

we're working to reproduce some of the results in the original paper. It is stated that the rainbow agent: "is feedforward and does not use any observation stacking outside of the last action, which is included in the current observation".

However, in the code the rainbow agent appears to stack the last 4 observations by default. Empirically (at least in early iterations) this doesn't seem to affect cumulative return much either way. Could someone clarify if obs stacking was used for the results in the paper?

Confirming that the results presented in the paper did not use any observation stacking.