ValueError when using temperature_action_probs as p argument in np.random.choice
martinholecekmax opened this issue · comments
When using temperature_action_probs as the p argument in np.random.choice, a ValueError is raised with the message "probabilities do not sum to 1". This occurs because temperature_action_probs is being raised to the power of 1 / self.args["temperature"], which can cause the values to no longer sum to 1.
To fix this issue, you can normalize temperature_action_probs so that its values sum to 1. You can do this by dividing temperature_action_probs by its sum:
temperature_action_probs /= np.sum(temperature_action_probs)
This will ensure that the values in temperature_action_probs sum to 1, which will allow you to use it as the p argument for np.random.choice.
Steps to Reproduce:
Run the code with temperature_action_probs as the p argument in np.random.choice.
Observe the ValueError with the message "probabilities do not sum to 1".
Expected Behavior:
The np.random.choice function should be able to accept temperature_action_probs as the p argument without raising a ValueError.
Actual Behavior:
A ValueError is raised with the message "probabilities do not sum to 1".
Fix:
Normalize temperature_action_probs so that its values sum to 1 by dividing it by its sum:
temperature_action_probs /= np.sum(temperature_action_probs)