Trained baselines values incl. active training

Question

Trained baselines values incl. active training

wallscheid opened this issue 2 years ago · comments

very nice work presented here, well done!

just a small question regarding the speed up evaluation for gymnax environments: I guess that the reported execution times with a neural network policy are based on a fixed policy without active learning (i.e., policy improvement steps), right?

did you also benchmark the speed up with active learning of the policy utilizing standard algorithms like PPO, ES,...?

Robert Tjarko Lange · Answer 1 · Wed Aug 24 2022 18:19:45 GMT+0800 (China Standard Time)

Thank you and excuse the late response. Yes, you are completely correct. The speed estimates are all done with respect to a fixed policy, i.e. random policy or fixed weights for network forward passes. This is mainly due to the fact that algorithm speed very much depends on implementation details that go beyond gymnax features. I have a lot of anectodal experience, but not hard numbers collected. Maybe I will do so eventually for a write-up. Feel free to share your experience :)

I will close this. Feel free to reopen!