Summary for BwK and Pure Exploration
johnantonn opened this issue · comments
Pure exploration:
- https://arxiv.org/pdf/0802.2655.pdf
- https://arxiv.org/pdf/1803.04665.pdf
- https://arxiv.org/pdf/1602.04589.pdf
- http://proceedings.mlr.press/v28/karnin13.pdf
- https://arxiv.org/pdf/1605.08671.pdf
- https://arxiv.org/pdf/1502.07943.pdf
- https://arxiv.org/pdf/1407.4443.pdf
- https://arxiv.org/pdf/1605.09004.pdf
BwK:
- https://dl.acm.org/doi/pdf/10.1145/3164539
- https://conferences.computer.org/focs/2019/pdfs/FOCS2019-7pBwCpNH4Mz2L4MJWVl6Xp/449NV6Undj5bRx1xZqVv04/5eGtcNbsnkv7O89B5SOnxw.pdf
- https://arxiv.org/pdf/1811.11881.pdf
- https://arxiv.org/pdf/2006.10459.pdf
- http://proceedings.mlr.press/v84/grover18b/grover18b.pdf
- http://grail.cs.washington.edu/projects/bandit/banditpaper.pdf
- https://arxiv.org/pdf/2002.00253.pdf
- https://arxiv.org/pdf/2102.06385.pdf
- Bandits with one knapsack
Videos:
People/Groups:
There's this weird assumption in many publications for the distributions of the rewards supporting [0,1]. I don't understand, however, if this is sort of a minimum assumption or a definitive assumption, and if the derived results can support any distribution or not.
https://arxiv.org/pdf/1805.05071.pdf
Support of a real-valued function f
is defined by all x
's such that f(x) ≠ 0
. So, what does this mean for the multi-armed bandit setting and for the problem instance where the distribution of rewards is the AUC score function? I find this to be very important about the assumptions and results on which we're basing our solution..