Questions on Imagination MDP and imagination horizon H = 15

Question

Questions on Imagination MDP and imagination horizon H = 15

GoingMyWay opened this issue 2 years ago · comments

Dear author,

After reading the code and the paper, I am confused about why Imagination MDP is introduced and why imagination horizon is needed. For example, with a trained world model and given a trajectory: $\tau$, we can sample an initial state and simulate a trajectory with the world model. In DreamerV2, each state in the sampled trajectory is used to simulate a sub-trajectory whose length is 15 and then used to update the policy. Why is your solution feasible for training model-based RL? It looks like magic. Could you help me to understand it?