Multi Armed Bandit Case Study

Author: Pratish Mashankar, pmashank@gmu.edu Guide: Dr. Sanmay Das, sanmay@gmu.edu

Introduction

I implemented three distinct algorithms for addressing multi-armed Bernoulli bandit problems: the greedy algorithm, UCB1, and Thompson Sampling. To evaluate their performance, I conducted extensive testing on two different bandit settings. The first setting involved an eleven-armed bandit with varying payoff probabilities from 0 to 1.0, while the second setting featured a five-armed bandit with specific probabilities of 0.3, 0.5, 0.7, 0.83, and 0.85. Employing these scenarios, I systematically examined the empirical properties of the algorithms, focusing on regret over time and the probability of selecting the optimal action over time. The experimentation involved testing the algorithms at different time step counts, specifically 10^3, 10^4, and 10^5 once, and then conducting multiple runs by re-initiating them from the start 100 times and averaging the results—that is running the experiment with 10^3, 10^4, and 10^5 time steps 100 times.

The notebook can be executed by running all the cells in the attached Jupyter Notebook. The detailed report is available as PDF in the repo.

Declaration

The ideas in this submission are original and were generated by Pratish Mashankar. ChatGPT was used as an editorial assistant, however, I take full responsibility for the originality and accuracy of the content. Case study submitted towards the partial procurement of credits for CS688 Machine Learning at GMU under Professor Sanmay Das.

PratishMashankar / multi-armed-bandit

Multi Armed Bandit Case Study

Introduction

Declaration

About

Languages