Epsilon Greedy: graph for Accuracy, Performance...sensitive to the declaration order of the arm ?

Question

Epsilon Greedy: graph for Accuracy, Performance...sensitive to the declaration order of the arm ?

phsimon opened this issue 2 years ago · comments

By coding for retrieving performance curve as shown in chapter "analyzing Results from Monte Carlo (chapter 4) study" Approach 1 (Proba of selecting best arm) Approach 2 (Average Reward), I noticed that changing the order of the arm, may change dramatically the curves.
For instance if I declare the mean of my five arms as follow :
means=[0.8, 0.9, 0.1, 0.5, 0.5] n_arms=len(means) random.shuffle(means) arms=[BernoulliArm(mu) for mu in means]

The shuffle change the order within means list, therefore the order of the arms, and the performance curve may be very different.

for instance, considering [0.9, 0.5, 0.1, 0.5, 0.8] as order after shuffling I get this curve (average reward per time):

whereas considering [0.1, 0.5, 0.9, 0.5, 0.8] , I get

Do you have an explanation ? Same phenomenon for proba of selecting best arm (but no so much).

Parameters:
num_sims=1000
horizon=250
epsilon=0.1