What Is the Point of Using "n" in UCB Formula?
Parnia opened this issue · comments
par commented
Hello,
In "ucb1.py" in the "rl" folder for solving the bandits problem, what is the point of using "n=total times of playing" in the UCB formula which is: mean + np.sqrt(2*np.log(n) / nj)
?
I tested the two following formulas (without "n" ) instead and they worked totally fine:
mean + np.sqrt(2 / nj)
and even
mean + (1 / nj)
I also tested them with different total number of plays, but the final results of the agents were so similar.
I would be grateful if you elaborate on the usage of n in the formula.
Best,
Parnia
LazyProgrammer.me commented
Post your course-related questions on the course Q&A