Model-free Reinforcement Learning Algorithms

This repository contains source codes for the paper titled Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes authored by Chen-Yu Wei, Mehdi Jafarnia-Jahromi, Haipeng Luo, Hiteshi Sharma, Rahul Jain. The paper was accepted at ICML 2020, and the arXiv version can be found here.

This paper proposes two model-free algorithms for tabular MDPs. The first algorithm Optimistic Discounted Q-learning achieves a regret bound of O(T^2/3) in weakly-communicating MDPs; the second algorithm MDP-OOMD achieves a regret bound of O(T^1/2) in ergodic MDPs.

The codes are implemented jointly by Mehdi Jafarnia-Jahromi, Hiteshi Sharma, and Chen-Yu Wei.

linyunfeng201203 / model-free-rl-algos

Model-free Reinforcement Learning Algorithms

About

Languages