eugeneyan / eugeneyan-comments

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

https://eugeneyan.com/writing/bandits/

utterances-bot opened this issue · comments

Bandits for Recommender Systems

Industry examples, exploration strategies, warm-starting, off-policy evaluation, and more.

https://eugeneyan.com/writing/bandits/

I previously built a recommendation system which approximated a bandit but which avoided actually estimating distributions. The intuition we had was that the ranked list produced by the recommender typically included some really good items but that these items might be deeply buried before the recommender had been trained enough to switch to exploitation. Because such buried items appeared on the second or later page of recommendations, very little information was ever gathered on them and the recommender would statically focus on exploiting relatively poor items.

Our approach was to reorder the results so that items near the top could be moved a bit and deeper items could be moved a lot. We did this by re-ordering by sorting on log(r) + ε where ε ~ N(0, k). The parameter k is set small to reduce exploration or large to increase it. In early systems, we set k globally, but it can also be set to something like 1/n where n is a composite measure of how much training we have on the top items in the list of recommended items.

The primary benefit of this approach is that it allows much of the benefit of more advanced true bandit approaches (CTR increase of >100% was common in music and video recommendation) but it could also be bolted on to any plausible recommendation engine.

Thanks for an informative article, Eugene! Can you also share alternative approaches that might work when the item set changes very quickly? (maybe an item set like Jobs for a job platform)