Clipping of actions

Question

Clipping of actions

thomasbi1 opened this issue 4 years ago · comments

In the original implementation, while running the cross entropy method, the sampled action sequences are clipped to min/max values (see line 40). As a results, the returned mean of the first action is within the allowed action bounds. However, when using this Pytorch port this doesnt seem to be the case, as when I'm using a custom gym env, the actions that come from the MPCAgent lie outside the bounds of the action space. Am I missing something?

Kai Arulkumaran · Answer 1 · Thu May 21 2020 06:28:51 GMT+0800 (China Standard Time)

Ah, thanks for spotting this. I tried to re-implement PlaNet from the paper, and this wasn't in the paper (just double-checked Algorithm 2). I've started a fix on https://github.com/Kaixhin/PlaNet/tree/clip, but since I don't have access to MuJoCo atm I can't finish this. If you're able to do so and send a PR then we can close this issue - just need to a return the min and max allowable values.

thomasbi1 · Answer 2 · Thu May 21 2020 06:56:18 GMT+0800 (China Standard Time)

Ok, I see. I haven't worked with any MuJoCo environments unfortunately and thus don't have a license. I'll be sure to submit a PR if I get around to working with MuJoCo envs.

thomasbi1 · Answer 3 · Thu May 21 2020 07:03:49 GMT+0800 (China Standard Time)

I also forgot to add, it seems that in the original implementation the values are clipped again after adding exploration noise (I guess because the added noise can push the actions outside the bounds again, see here)

Kai Arulkumaran · Answer 4 · Fri May 22 2020 05:12:54 GMT+0800 (China Standard Time)

Alright - I've clipped the actions there as well too now, so feel free to use the clip branch for Gym envs. I'm only going to merge into master once I get a fix for DM Control Suite.

Kai Arulkumaran · Answer 5 · Fri May 22 2020 18:25:59 GMT+0800 (China Standard Time)

Had a friend get the right attributes for DM Suite for me so sorted and can now close!