zhpinkman / armed-bandit

Solving n-armed-bandit problems using different policies to find the path with the least regret. The policies used in this project were policy gradient and Thompson sampling. All the environments and agents are implemented with the aid of the Amalearn library. This project was carried out as part of the Reinforcement learning master course offered at the University of Tehran under the supervision of Prof Nili.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Solving n-armed-bandit problems using different policies to find the path with the least regret. Some of the policies used in this project were policy gradient and Thompson sampling. All the environments and agents are implemented with the aid of the Amalearn library. This project was carried out as part of the Reinforcement graduate course offered at University of Tehran under the supervision of Prof Nili.

You can find all the information about each part of the project in results section.

  • Packet-routing

The task of finding the best route to transfer a packet through a congested network

  • Thompson-sampling-greedy-policies

Comparing Thompson sampling and greedy policies on a 10-armed bandit task

  • Waiting-monetary-value-prospect-theory

Investigating the monetary value of waiting time incorporating the Prospect theory by Daniel Kahneman

About

Solving n-armed-bandit problems using different policies to find the path with the least regret. The policies used in this project were policy gradient and Thompson sampling. All the environments and agents are implemented with the aid of the Amalearn library. This project was carried out as part of the Reinforcement learning master course offered at the University of Tehran under the supervision of Prof Nili.


Languages

Language:Python 95.6%Language:Jupyter Notebook 4.4%