ValueError when using temperature_action_probs as p argument in np.random.choice

Question

ValueError when using temperature_action_probs as p argument in np.random.choice

martinholecekmax opened this issue a year ago · comments

When using temperature_action_probs as the p argument in np.random.choice, a ValueError is raised with the message "probabilities do not sum to 1". This occurs because temperature_action_probs is being raised to the power of 1 / self.args["temperature"], which can cause the values to no longer sum to 1.

To fix this issue, you can normalize temperature_action_probs so that its values sum to 1. You can do this by dividing temperature_action_probs by its sum:

temperature_action_probs /= np.sum(temperature_action_probs)

This will ensure that the values in temperature_action_probs sum to 1, which will allow you to use it as the p argument for np.random.choice.

Steps to Reproduce:

Run the code with temperature_action_probs as the p argument in np.random.choice.
Observe the ValueError with the message "probabilities do not sum to 1".
Expected Behavior:

The np.random.choice function should be able to accept temperature_action_probs as the p argument without raising a ValueError.

Actual Behavior:

A ValueError is raised with the message "probabilities do not sum to 1".

Fix:

Normalize temperature_action_probs so that its values sum to 1 by dividing it by its sum:

temperature_action_probs /= np.sum(temperature_action_probs)