regret metrics

Question

regret metrics

Akshaysharma29 opened this issue 6 months ago · comments

Hi Team, Thanks for sharing this wonderful library. can you guide me on how to use regret metrics while using MAB?

Bernard Kleynhans · Answer 1 · Fri Feb 23 2024 01:42:34 GMT+0800 (China Standard Time)

In the context of multi-armed bandits, regret is typically defined as the difference between the cumulative reward given the optimal decision at each time step and the cumulative reward observed for the selected policy. In practice, we typically do not know the optimal decisions though (otherwise we would just use those). In our implementation we therefore focus on finding policies that maximize the cumulative reward.

Akshay Sharma · Answer 2 · Fri Feb 23 2024 10:48:07 GMT+0800 (China Standard Time)

Make sense @bkleyn. Thank you 👍