danijar / dreamerv2

Mastering Atari with Discrete World Models

Home Page:https://danijar.com/dreamerv2

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Why share states across random batches for training the world model?

sai-prasanna opened this issue · comments

commented

From my understanding, the posterior of the last timestep from a batch is used as the start state for the next batch.
Is this intended? If so, is it just to avoid always initializing the start state to zeros and have it model some random sample from the current latent distribution?

state, outputs, mets = self.wm.train(data, state)

This is only used when is_first is False at the beginning of the training batch. By default, it's always True so the world model resets its hidden state (in the RSSM class). But this implementation could also support training with truncated backprop through time on longer sequences than can be fit into memory at the same time.