danijar / dreamerv2

Mastering Atari with Discrete World Models

Home Page:https://danijar.com/dreamerv2

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Questions on Imagination MDP and imagination horizon H = 15

GoingMyWay opened this issue · comments

Dear author,

After reading the code and the paper, I am confused about why Imagination MDP is introduced and why imagination horizon is needed. For example, with a trained world model and given a trajectory: $\tau$, we can sample an initial state and simulate a trajectory with the world model. In DreamerV2, each state in the sampled trajectory is used to simulate a sub-trajectory whose length is 15 and then used to update the policy. Why is your solution feasible for training model-based RL? It looks like magic. Could you help me to understand it?