spokV / IB_FPSRL

Prototype to benchmark the fuzzy particle swarm reinforcement learning (Hein, Hentschel, Runkler, Udluft 2017) algorithm on the industrial benchmark

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Industrial Benchmark for Fuzzy Particle Swarm Reinforcement Learning

This repository benchmarks the approach to controller policy optimization proposed by Hein et al., 2017 called Fuzzy Particle Swarm Reinforcement Learning (FPSRL). It does so by applying a benchmarking approach the authors themselves used to benchmark a similar approach in Hein, Udluft, Runkler, 2018 which followed the same idea but used genetic programming instead of particle swarm optimization.

Approach

The FPSRL approach tries to solve the problem of finding control policies to real world problems by machine learning. We recommend to read the paper for a detailed explanation but in short, the approach takes the following steps:

  1. Gather real world data of the e.g. industrial plant you want to generate a control strategy for by just measuring its performance
  2. Generalize this data to a world model turning the data gathered into a reward function
  3. Use particle swarm optimization to find an optimal policy regarding this generalized world model
  4. If the performance in the generalized model of the policy is sufficient, evaluate the policy in the real world

Throughtout this repository, we will use keras (https://keras.io/) with the tensorflow (https://www.tensorflow.org/) backend for the world model generation and pyswarms (https://github.com/ljvmiranda921/pyswarms) by Miranda, 2018 for the policy optimization.

The network topology used for generating the world model was described in Duell, Udluft, Sterzing, 2012.

Industrial Benchmark

The industrial benchmark (https://github.com/siemens/industrialbenchmark) tries to give a benchmarking environment for real world industrial plants. It generates output from a high-dimensional and hidden state-space only a part of which is observable. Cf. the repository and the papers linked within that for a more detailed explanation.

In short: In this repository we use the industrial benchmark to a) generate data to create a world model, thus simulating real world runs of some industrial plant and b) evaluate generated policies.

As the industrial benchmark has two outputs, we generate a world model for each output seperately.

Implementation

This repository serves various scripts that implement parts of the FPSRL approach.

Script Purpose
gen_dataset.py This script generates a bunch of data points using the industrial benchmark submodule such that a world model can be inferred
ib_world_model.py This script takes some data points generated by the previous script and generalizes the output to a function. It does so by predicting a time series of data points.
ib_policy.py This script takes two world models for each of the outputs of the industrial benchmark and tries to find a cost-optimal policy for the industrial benchmark.

The eval_*.py scripts evaluate the performance of a single-output world model or a policy respectively.

All scripts in this repository are cmdline executable meaning they take arguments and can be run seperately. Most of them take a positional argument of a config file and the optional arguments -c, --clean which will ignore cached results for this script and -C, --strict-clean will will ignore cached results for this script and all other scripts called recursively.

Configuration files

All policy-related scripts take a policy-config whereas all other scripts take a data-config as positional argument. You can find example configuration files (which also will be used as defaults) in misc/dicts.py.

Confer the inline documentation of this repository for details.

Execution

You can create a policy by running these steps subsequently in your shell:

pipenv init
pipenv install
python ib_policy.py cfg/policy.json -C

The pipenv commands will install all the dependencies necessary in a clean virtual environment and the ib_policy.py will recursively call all the other scripts necessary.

After that, the follwing directories will have been created:

Name Description
/data Will hold data used for training the world model as serialized numpy arrays. Use numpy.load to deserialize them.
/models Will hold a world model for each output variable of the industrial benchmark. Use keras.models.load_model with the custom_objects option set accordingly to deserialize them.
/policies Will hold the policies as serialized numpy arrays. Again, use numpy.load to deserialize them and of the class Policy in policy.py the method update to load it.
/evaluation Will hold metrics for every world model and the policy generated. It also serves some example prediction graphs for every world model where pX denotes the initial setpoint the industrial benchmark was initiliazed with.

References

Daniel Hein, Alexander Hentschel, Thomas Runkler, Steffen Udluft,
Particle swarm optimization for generating interpretable fuzzy reinforcement learning policies,
Engineering Applications of Artificial Intelligence, Volume 65, 2017, Pages 87-98, ISSN 0952-1976,
https://doi.org/10.1016/j.engappai.2017.07.005.
Daniel Hein, Steffen Udluft, Thomas A. Runkler,
Interpretable policies for reinforcement learning by genetic programming,
Engineering Applications of Artificial Intelligence, Volume 76, 2018, Pages 158-169, ISSN 0952-1976,
https://doi.org/10.1016/j.engappai.2018.09.007.
Lester James V. Miranda,
PySwarms: a research toolkit for Particle Swarm Optimization in Python,
Journal of Open Source Software, Volume 3 (21), 2018, 433,
https://doi.org/joss.00433.
Siegmund Duell, Steffen Udluft, Volkmar Sterzing
Solving Partially Observable Reinforcement Learning Problems with Recurrent Neural Networks,
In: Grégoire Montavon, Geneviève B. Orr, Klaus-Robert Müller (eds)
Neural Networks: Tricks of the Trade. Lecture Notes in Computer Science,
vol 7700., Pages 709-733, Springer, Berlin, Heidelberg, 2012.

About

Prototype to benchmark the fuzzy particle swarm reinforcement learning (Hein, Hentschel, Runkler, Udluft 2017) algorithm on the industrial benchmark


Languages

Language:Python 68.1%Language:TeX 31.9%