Multi-armed bandit problem All simulations were made with a horizon of 50,000, 10 arms, and each arm having gaussian distribution with mean in range [1, 2] and standard deviation 1.