The SHAMan application
SHAMan is an out-of-the-box Web application to perform black-box auto-tuning of custom computer components running on a distributed system, for an application submitted by a user. It relies on black-box auto-tuning to find the components' parametrization that are the most efficient in terms of execution time.
Main goal and features
SHAMan is a framework to perform auto-tuning of configurable component running on HPC distributed systems. It performs the auto-tuning loop by parametrizing the component, submitting the job through the Slurm workload manager, and getting the corresponding execution time. Using the combination of the history (parametrization and execution time), the framework then uses black-box optimization to select the next most appropriate parametrization, up until the number of allocated runs is over.
This framework integrates three state-of-art heuristics, as well as noise reduction strategies to deal with the possible interference of shared resources for large scale HPC systems, and pruning strategies to limit the time spent by the optimization process.
Compared to already existing softwares, it provides these main advantages:
Basic architecture
SHAMan relies on a microservice architecture and integrates:
- A front-end Web application, running with Nuxt.js
- A back-end storage database, relying on MongoDB
- An optimization engine, reached by the API through a message broker, and which uses runners to perform optimization tasks, written in Python
The several services communicate through a REST API, using FastAPI.
Installation
SHAMan can be installed in a containerized environment, as several docker containers run with docker-compose
. However, this type of install is only suitable for demo purpose: it is not possible to infer proper performance metrics from hardwares or softwares when the optimization engine is running in a containerized environment.
In both cases, the latest version of SHAMan pust be pulled by cloning this repository. The user must then move to the cloned repository.
Demo deployment
The demo version of the application can be run by calling:
docker-compose -f demo-docker-compose.yml up
Once the application is up and running, visit localhost:3000
and check that you can access the web interface.
Production deployment
The deployment of SHAMan in production is described in the documentation.
Registering a new component
Running the command shaman-install
with a YAML file describing a component registers it to the application. This YAML file must describe how the component is launched and declares its different parameters and how they must be used to parametrize the component. After the installation process, the components are available in the launch menu of the Web interface.
components:
component_1:
cli_command: example_1
header: example_header
command: example_cmd
ld_preload: example_lib
parameters:
param_1:
env_var: True
type: int
default: 1
param_2:
cmd_var: True
type: int
default: 100
flag: test
param_3:
cli_var: True
type: int
default: 1
component_2:
...
This component can be activated through options passed on the job's command line, a command called at the top of the script or the setting of the LD_PRELOAD
variable.
The header
variable is a command written at the top of the script and that is called between each optimization round, before running the job. For instance, a clear cache command is called when tuning I/O accelerators to ensure independence between runs.
The parameters with a default value, can be either passed:
- As an environment variable (
env_var=True
) - As a variable appended to the command line variable with a flag (
cmd_var=True
) - As a variable passed on the job's command line (
cli_var=True|
)
For a more in-depth description of the parameters used to set-up a component, go to xxx.
Creating an experiment
To launch an experiment through the /create
menu of the Web interface, the optimization experiment should be parametrized by:
- Writing an application according to Slurm sbatch format.
- Selecting the component and the parametric grid through the radio buttons (minimal, maximal and step value).
- Configuring the optimization heuristic, chosen freely among available ones. Resampling parametrization and pruning strategies can also be activated.
Once the experiment is created and launched, it is available for real-time vizualization in the /launch
menu of the Web interface.
For an explaination on how to launch an experiment using a command line interface, go to the documentation.
Documentation
More details about this project are accessible here.
Maintainers
If you have any questions regarding this project, please contact Sophie Robert. UI and logo are designed by Sébastien Bakirci.