RobertTLange / gymnax

RL Environments in JAX 🌍

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Trained baselines values incl. active training

wallscheid opened this issue · comments

very nice work presented here, well done!

just a small question regarding the speed up evaluation for gymnax environments: I guess that the reported execution times with a neural network policy are based on a fixed policy without active learning (i.e., policy improvement steps), right?

did you also benchmark the speed up with active learning of the policy utilizing standard algorithms like PPO, ES,...?

Thank you and excuse the late response. Yes, you are completely correct. The speed estimates are all done with respect to a fixed policy, i.e. random policy or fixed weights for network forward passes. This is mainly due to the fact that algorithm speed very much depends on implementation details that go beyond gymnax features. I have a lot of anectodal experience, but not hard numbers collected. Maybe I will do so eventually for a write-up. Feel free to share your experience :)

I will close this. Feel free to reopen!