Reward scaling
ADGEfficiency opened this issue · comments
BonsAI have a great video on reward function design.
The techniques that would be relevant for energy_py would be scaling using the max reward (max = peak capacity * max price). The issue with this is that the reward is no longer directly interpretable as money.
This can be dealt with by including a cost in the env info dict.
Part of this work would be looking at how the distribution of rewards changes before and after scaling.