ppo_nn_agent.gin hyperparam tuning
amirjamez opened this issue · comments
Hi @yundiqian. I was skimming through the hyperparams of https://github.com/google/ml-compiler-opt/blob/main/compiler_opt/rl/inlining/gin_configs/ppo_nn_agent.gin and it seems counterintuitive to me that both PPOAgent.normalize_rewards
and PPOAgent.normalize_observations
are assigned as False
. Would you be able to provide some info on it? Looking at the TF codebase (https://github.com/tensorflow/agents/blob/master/tf_agents/agents/ppo/ppo_agent.py#L206)), it is advised to normalize rewards and observation, so I was wondering if you had tried these out before?
Thanks!
-Amir
This is a great question! Yes, normalization helps, but we turned it off because we do the normalization ourselves so we don't rely on the normalization in the TF-Agents for that, i.e., the input to the Agent is already 'normalized' to a reasonable value range
However, I tuned the parameters a long time ago, so I'm not sure about the details when I tune this parameter. You may try to tune it to see, and let us know if you find it being helpful!
I see. So that's basically the job of bucketization. Sure, I can give it a try and update this thread.