flexi-framework / relexi

A scalable reinforcement learning framework for CFD on HPC systems

Home Page:https://flexi-framework.github.io/relexi/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

logo

license doi docsbuild github

About Relexi

Relexi is a Reinforcement Learning (RL) framework developed for the high-order HPC flow solver FLEXI. However, Relexi is developed with modularity in mind and allows to used with other HPC solvers as well. Relexi builds upon TensorFlow and its RL extension TF-Agents. For the efficient communication, data handling and the managment of the simulations runs on HPC systems, Relexi uses the SmartSim package with its SmartRedis communication clients. For details on its scaling behavior, suitability for HPC and for use cases, please see

This is a scientific project. If you use Relexi or find it helpful, please cite the project using a suitable reference from the list above referring to either the general Relexi project, its HPC aspects or its application for scientific modeling tasks, respectively.

Installation

The following quick start guide covers a standard installation of the Relexi framework.

Dependencies

Relexi has a variety of dependencies. The main dependencies of Relexi are listed in the following with their supported version.

Package Version Note
Python ≥3.9
TensorFlow 2.9 - 2.15
TF-Agents ≥0.13
SmartSim 0.4 - 0.6
SmartRedis ≥0.4.1
CMake ≥3.0
Make ≥4.0
gcc-fortran ≥9.4 gcc 10 not supported! (gcc ≥11 is fine)
gcc ≥9.4
gcc-c++ ≥9.4

Be aware that the major dependencies (SmartSim, TensorFlow, FLEXI) might have a more expansive dependency tree, for which we refer the user to the corresponding documentations for details.

Prerequisites

Open a terminal and change into the directory where you want to install Relexi and its dependecies. It is highly recommended to use some form of virtual environment for the installation. You can use create and activate a new environment using virtualenv via

python3 -m pip install virtualenv
python3 -m virtualenv env_relexi
source env_relexi/bin/activate

Install Relexi

Clone the Relexi repository and install the necessary dependencies with

git clone https://github.com/flexi-framework/relexi.git
python3 -m pip install -r relexi/requirements.txt

Install SmartSim

After installing the smartsim package via pip, it has to be installed with its dependencies via the smart commandline tool:

smart clobber && smart clean
smart build --no_pt

Install FLEXI

Clone the required version of FLEXI from GitHub and build it with the standard compile flags

git clone --branch smartsim --depth 1 https://github.com/flexi-framework/flexi-extensions.git
cd flexi-extensions
mkdir -p build && cd build
cmake .. -DLIBS_BUILD_HDF5=ON -DLIBS_USE_MPI=OFF -DLIBS_BUILD_SMARTREDIS=ON -DLIBS_USE_SMARTREDIS=ON -DLIBS_USE_FFTW=ON -DPOSTI=OFF -DFLEXI_TESTCASE=hit -DFLEXI_NODETYPE=GAUSS-LOBATTO -DFLEXI_SPLIT_DG=ON -DFLEXI_EDDYVISCOSITY=ON
make -j
cd ../../

Note that in this configuration FLEXI tries to install all its dependencies automatically, which can require several minutes. If HDF5 is available on the system, the compile time can be reduced significantly by switching off the corresponding flag in the CMake configuration. Moreover, this configuration compiles FLEXI without MPI and thus in its serial version. To enable MPI or to change the configuration of FLEXI, please see its official documentation for more details.

Running the Code

Relexi comes with some example setups to verify that it is correctly installed. Enter the directory of the first test case and run Relexi.

cd relexi/examples/HIT_24_DOF/
python3 ../../src/relexi.py prm.yaml

The file prm.yaml contains the configuration for the reinforcement learning training. It can be adapted using the text editor of your choice. If you have installed the flexi binary not in the default path, adapt the path of the executable under executabl_path accordingly. You may also set the number of parallel environments by setting num_parallel_environments according to your local hardware resources. The number of processors used for each FLEXI environment can also be changed by setting num_procs_per_environment to the appropriate value. Be aware that for using FLEXI in parallel, i.e. with more than 1 CPU core per environment, it has to be compiled with MPI. Please refer to the FLEXI documentation for details.

Results

To visualize the results, Relexi uses the TensorBoard suite. After running the code, Relexi should create a directory logs, where the model, training checkpoints and the training metrics are saved. Open it with

tensorboard --logdir logs/

Tensorboard then provides a URL that can be opened in the Browser. If the training is performed on a remote server, the port where TensorBoard sends its data has to be redirected to your local machine. If you use ssh to connect to the server, you can redirect the standard TensorBoard port (6006) with

ssh -L 6006:127.0.0.1:6006 your_remote_server

Documentation

The documentation of Relexi can be found here. It is built with the pdoc package, which is included in the requirements.txt and thus is already installed with Relexi. To build the documentation yourself, execute

cd docs
bash build_docs.sh

Open the resulting index.html with your browser.

Testing

A suite of unit tests is implemented for Relexi using the pytest testing environment. To run the tests, simply execute in the root directory

pytest

About

A scalable reinforcement learning framework for CFD on HPC systems

https://flexi-framework.github.io/relexi/

License:GNU General Public License v3.0


Languages

Language:Python 100.0%