zhpinkman/armed-bandit

zhpinkman / armed-bandit

Solving n-armed-bandit problems using different policies to find the path with the least regret. The policies used in this project were policy gradient and Thompson sampling. All the environments and agents are implemented with the aid of the Amalearn library. This project was carried out as part of the Reinforcement learning master course offered at the University of Tehran under the supervision of Prof Nili.

Solving n-armed-bandit problems using different policies to find the path with the least regret. Some of the policies used in this project were policy gradient and Thompson sampling. All the environments and agents are implemented with the aid of the Amalearn library. This project was carried out as part of the Reinforcement graduate course offered at University of Tehran under the supervision of Prof Nili.

You can find all the information about each part of the project in results section.

Packet-routing

The task of finding the best route to transfer a packet through a congested network

Thompson-sampling-greedy-policies

Comparing Thompson sampling and greedy policies on a 10-armed bandit task

Waiting-monetary-value-prospect-theory

Investigating the monetary value of waiting time incorporating the Prospect theory by Daniel Kahneman

About

zhpinkman / armed-bandit

About

Languages