lehduong / Job-Scheduling-with-Reinforcement-Learning

Learning in Noisy MDP (which is governed by stochastic, exogenous input processes) with input-dependent baseline

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Learning to Assign Credit in Input-driven Environment (LACIE) reduce the variance of estimation of advantages value in noisy MDP with hindsight distribution.

Input-driven MDP

Input-driven MDP are the Markov processes governed by not only agent's actions but also stochastic, exogenous input processes [1]. These environments have high variance inheritantly making it hard to learn optimal policy.

This repository implemented:

  • Input-dependence baseline as in proposed in [1].

  • Lacie - an algorithm that learn to weight the advantages of each rollout in hindsight with respect to future input sequences.

Install Dependencies

  1. Install Pytorch
pip install torch torchvision
  1. install Tensorflow 2
pip install tensorflow=2.2

or

pip install tensorflow-gpu=2.2
  1. Install OpenAI baseline (Tensorflow 2 version)
git clone https://github.com/openai/baselines.git -b tf2 && \
cd baselines && \
pip install -e .

Note: I haven't tested the code on Tensorflow 1 yet but it should work as well.

  1. Install Park Platform. I modified the platform slightly to make it compatible with OpenAI's baseline.
git clone https://github.com/lehduong/park &&\
cd park && \
pip install -e .

Run experiments

See scripts for examples.

Results:

Reward of A2C+Lacie (yellow) vs A2C (blue) reward

Value loss of A2C+Lacie (yellow) vs A2C (blue) during training: train-value-loss

Reference

[1] Variance Reduction for Reinforcement Learning in Input-Driven Environments.

Acknowledgement

The started code is based on ikostrikov's repository.

About

Learning in Noisy MDP (which is governed by stochastic, exogenous input processes) with input-dependent baseline

License:Apache License 2.0


Languages

Language:Python 86.3%Language:Shell 13.7%