This repository contains source codes for the paper titled Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes authored by Chen-Yu Wei, Mehdi Jafarnia-Jahromi, Haipeng Luo, Hiteshi Sharma, Rahul Jain. The paper was accepted at ICML 2020, and the arXiv version can be found here.
This paper proposes two model-free algorithms for tabular MDPs. The first algorithm Optimistic Discounted Q-learning achieves a regret bound of O(T2/3) in weakly-communicating MDPs; the second algorithm MDP-OOMD achieves a regret bound of O(T1/2) in ergodic MDPs.
The codes are implemented jointly by Mehdi Jafarnia-Jahromi, Hiteshi Sharma, and Chen-Yu Wei.