Industrial Benchmark for Fuzzy Particle Swarm Reinforcement Learning

This repository benchmarks the approach to controller policy optimization proposed by Hein et al., 2017 called Fuzzy Particle Swarm Reinforcement Learning (FPSRL). It does so by applying a benchmarking approach the authors themselves used to benchmark a similar approach in Hein, Udluft, Runkler, 2018 which followed the same idea but used genetic programming instead of particle swarm optimization.

Approach

The FPSRL approach tries to solve the problem of finding control policies to real world problems by machine learning. We recommend to read the paper for a detailed explanation but in short, the approach takes the following steps:

Gather real world data of the e.g. industrial plant you want to generate a control strategy for by just measuring its performance
Generalize this data to a world model turning the data gathered into a reward function
Use particle swarm optimization to find an optimal policy regarding this generalized world model
If the performance in the generalized model of the policy is sufficient, evaluate the policy in the real world

Throughtout this repository, we will use keras (https://keras.io/) with the tensorflow (https://www.tensorflow.org/) backend for the world model generation and pyswarms (https://github.com/ljvmiranda921/pyswarms) by Miranda, 2018 for the policy optimization.

The network topology used for generating the world model was described in Duell, Udluft, Sterzing, 2012.

Industrial Benchmark

The industrial benchmark (https://github.com/siemens/industrialbenchmark) tries to give a benchmarking environment for real world industrial plants. It generates output from a high-dimensional and hidden state-space only a part of which is observable. Cf. the repository and the papers linked within that for a more detailed explanation.

In short: In this repository we use the industrial benchmark to a) generate data to create a world model, thus simulating real world runs of some industrial plant and b) evaluate generated policies.

As the industrial benchmark has two outputs, we generate a world model for each output seperately.

Implementation

This repository serves various scripts that implement parts of the FPSRL approach.

Script	Purpose
`gen_dataset.py`	This script generates a bunch of data points using the industrial benchmark submodule such that a world model can be inferred
`ib_world_model.py`	This script takes some data points generated by the previous script and generalizes the output to a function. It does so by predicting a time series of data points.
`ib_policy.py`	This script takes two world models for each of the outputs of the industrial benchmark and tries to find a cost-optimal policy for the industrial benchmark.

The eval_*.py scripts evaluate the performance of a single-output world model or a policy respectively.

All scripts in this repository are cmdline executable meaning they take arguments and can be run seperately. Most of them take a positional argument of a config file and the optional arguments -c, --clean which will ignore cached results for this script and -C, --strict-clean will will ignore cached results for this script and all other scripts called recursively.

Configuration files

All policy-related scripts take a policy-config whereas all other scripts take a data-config as positional argument. You can find example configuration files (which also will be used as defaults) in misc/dicts.py.

Confer the inline documentation of this repository for details.

Execution

You can create a policy by running these steps subsequently in your shell:

pipenv init
pipenv install
python ib_policy.py cfg/policy.json -C

The pipenv commands will install all the dependencies necessary in a clean virtual environment and the ib_policy.py will recursively call all the other scripts necessary.

After that, the follwing directories will have been created:

Name	Description
`/data`	Will hold data used for training the world model as serialized numpy arrays. Use `numpy.load` to deserialize them.
`/models`	Will hold a world model for each output variable of the industrial benchmark. Use `keras.models.load_model` with the `custom_objects` option set accordingly to deserialize them.
`/policies`	Will hold the policies as serialized numpy arrays. Again, use `numpy.load` to deserialize them and of the class `Policy` in `policy.py` the method `update` to load it.
`/evaluation`	Will hold metrics for every world model and the policy generated. It also serves some example prediction graphs for every world model where `pX` denotes the initial setpoint the industrial benchmark was initiliazed with.

References

Daniel Hein, Alexander Hentschel, Thomas Runkler, Steffen Udluft,
Particle swarm optimization for generating interpretable fuzzy reinforcement learning policies,
Engineering Applications of Artificial Intelligence, Volume 65, 2017, Pages 87-98, ISSN 0952-1976,
https://doi.org/10.1016/j.engappai.2017.07.005.

Daniel Hein, Steffen Udluft, Thomas A. Runkler,
Interpretable policies for reinforcement learning by genetic programming,
Engineering Applications of Artificial Intelligence, Volume 76, 2018, Pages 158-169, ISSN 0952-1976,
https://doi.org/10.1016/j.engappai.2018.09.007.

Lester James V. Miranda,
PySwarms: a research toolkit for Particle Swarm Optimization in Python,
Journal of Open Source Software, Volume 3 (21), 2018, 433,
https://doi.org/joss.00433.

Siegmund Duell, Steffen Udluft, Volkmar Sterzing
Solving Partially Observable Reinforcement Learning Problems with Recurrent Neural Networks,
In: Grégoire Montavon, Geneviève B. Orr, Klaus-Robert Müller (eds)
Neural Networks: Tricks of the Trade. Lecture Notes in Computer Science,
vol 7700., Pages 709-733, Springer, Berlin, Heidelberg, 2012.

spokV / IB_FPSRL