BernoulliBandit observation space bounds are incorrect when time normalisation is enabled.
jaronsgit opened this issue · comments
Jaron Cohen commented
normalize_time: bool = True results in the number of steps being normalised between -1 and 1, while the observation space bounds are 0 and params.max_steps_in_episode = 100.
Robert Tjarko Lange commented
Thank you so much @jaronsgit -- it is merged and will be part of the next release. Cheers, Rob