fidelity / mab2rec

[AAAI 2024] Mab2Rec: Multi-Armed Bandits Recommender

Home Page:https://fidelity.github.io/mab2rec/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

regret metrics

Akshaysharma29 opened this issue · comments

Hi Team, Thanks for sharing this wonderful library. can you guide me on how to use regret metrics while using MAB?

Hey @Akshaysharma29

In the context of multi-armed bandits, regret is typically defined as the difference between the cumulative reward given the optimal decision at each time step and the cumulative reward observed for the selected policy. In practice, we typically do not know the optimal decisions though (otherwise we would just use those). In our implementation we therefore focus on finding policies that maximize the cumulative reward.

Make sense @bkleyn. Thank you 👍