Why share states across random batches for training the world model?
sai-prasanna opened this issue · comments
From my understanding, the posterior of the last timestep from a batch is used as the start state for the next batch.
Is this intended? If so, is it just to avoid always initializing the start state to zeros and have it model some random sample from the current latent distribution?
Line 60 in 07d906e
This is only used when is_first
is False
at the beginning of the training batch. By default, it's always True
so the world model resets its hidden state (in the RSSM class). But this implementation could also support training with truncated backprop through time on longer sequences than can be fit into memory at the same time.