Kaixhin / PlaNet

Deep Planning Network: Control from pixels by latent planning with learned dynamics

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Clipping of actions

thomasbi1 opened this issue · comments

In the original implementation, while running the cross entropy method, the sampled action sequences are clipped to min/max values (see line 40). As a results, the returned mean of the first action is within the allowed action bounds. However, when using this Pytorch port this doesnt seem to be the case, as when I'm using a custom gym env, the actions that come from the MPCAgent lie outside the bounds of the action space. Am I missing something?

Ah, thanks for spotting this. I tried to re-implement PlaNet from the paper, and this wasn't in the paper (just double-checked Algorithm 2). I've started a fix on https://github.com/Kaixhin/PlaNet/tree/clip, but since I don't have access to MuJoCo atm I can't finish this. If you're able to do so and send a PR then we can close this issue - just need to a return the min and max allowable values.

Ok, I see. I haven't worked with any MuJoCo environments unfortunately and thus don't have a license. I'll be sure to submit a PR if I get around to working with MuJoCo envs.

I also forgot to add, it seems that in the original implementation the values are clipped again after adding exploration noise (I guess because the added noise can push the actions outside the bounds again, see here)

Alright - I've clipped the actions there as well too now, so feel free to use the clip branch for Gym envs. I'm only going to merge into master once I get a fix for DM Control Suite.

Had a friend get the right attributes for DM Suite for me so sorted and can now close!