danijar / dreamerv2

Mastering Atari with Discrete World Models

Home Page:https://danijar.com/dreamerv2

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Should policy state be reset after every episode?

edwhu opened this issue · comments

It seems like the state of the agent (self._state) is not initialized to 0 on reset. Only in the very first episode, it is None, so it will be set to 0s. Since driver.reset() is never called again in api.py, self._state will be carried over from previous episodes on episode reset.
Is this intentional?

obs = {
i: self._envs[i].reset()
for i, ob in enumerate(self._obs) if ob is None or ob['is_last']}
for i, ob in obs.items():
self._obs[i] = ob() if callable(ob) else ob
act = {k: np.zeros(v.shape) for k, v in self._act_spaces[i].items()}
tran = {k: self._convert(v) for k, v in {**ob, **act}.items()}
[fn(tran, worker=i, **self._kwargs) for fn in self._on_resets]
self._eps[i] = [tran]

Yep, the world model resets its state based on the is_first flag:

'b,b...->b...', 1.0 - is_first.astype(x.dtype), x),