danijar / dreamerv2

Mastering Atari with Discrete World Models

Home Page:https://danijar.com/dreamerv2

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How does dreamerv2 perform on feature-based tasks?

xlnwel opened this issue · comments

Hello. Thanks for your interesting work!

I'm planning to use dreamerv2 on some feature-based tasks. After doing some searching, I found no one has tried to do it before. I'm wondering if there is any difficulty on doing so? What problems would you anticipate?

Hi, it works well :)

Hi, glad to hear that!

In that case, may I further ask several questions?

  1. Why would it work well? Based on several recent analyses, such as MBPO, learning on models can result in compounding errors. That is, errors in model learning accumulate as the rollout length increases. These make me wonder why Dreamerv2 can work well in practice by learning thoroughly from imagined trajectories. For image tasks, I can understand that Dreamerv2 compresses the large observation space(which contains massive redundancy) into a small latent space and thereby enabling efficient reinforcement learning on a smaller input space. But for feature-based tasks, the redundancy is much less. In that case, what is the benefit of learning in the latent space?
  2. Where should I take extra care when applying DreamerV2 to feature-based tasks? For example, I want to make a few modifications to DreamerV2 so as to solve a multi-agent task--overcooked, which hyperparameters should be tuned more carefully?

Predicting in latent space tends to have much less accumulating error, probably because there is no complex manifold to follow as closely there.

For training on proprioceptive inputs, just make sure the inputs are roughly in range -1 to +1, sometimes velocities can get very large. Default hparams should work.

Thanks a ton for the explanation and advice!