observation stacking?

Question

observation stacking?

mwalton opened this issue 5 years ago · comments

we're working to reproduce some of the results in the original paper. It is stated that the rainbow agent: "is feedforward and does not use any observation stacking outside of the last action, which is included in the current observation".

However, in the code the rainbow agent appears to stack the last 4 observations by default. Empirically (at least in early iterations) this doesn't seem to affect cumulative return much either way. Could someone clarify if obs stacking was used for the results in the paper?

Jun Tian · Answer 1 · Wed Aug 28 2019 10:42:19 GMT+0800 (China Standard Time)

I think the following indicates that only the latest observation is used.

https://github.com/deepmind/hanabi-learning-environment/blob/253d6fff48dac3d2118cefc308fee156a7de9445/agents/rainbow/configs/hanabi_rainbow.gin#L39

nolanbard · Answer 2 · Fri Sep 20 2019 05:47:02 GMT+0800 (China Standard Time)

Confirming that the results presented in the paper did not use any observation stacking.