Slurm Replay allows replaying traces of job scheduled on HPC system using Slurm. By using the same Slurm configuration and unmodified Slurm code-base used by a production HPC system, one can replay jobs that have been submitted. Slurm-Replay enables the capability to investigate different Slurm configurations or policies and see their impacts on an production workload.
For more information, check out the
There is a paper in the proceedings of the SC 2018. We would appreciate a citation.
@inproceedings{
author={M. Martinasso and M. Gila and M. Bianco and S. R. Alam and C. McMurtrie and T. C. Schulthess},
title={{RM-Replay: A High-Fidelity Tuning, Optimization and Exploration Tool for Resource Management}},
year={2018},
month={Nov.},
booktitle={Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC18)},
location={Dallas, Texas},
publisher={IEEE Press},
pages={},
isbn={},
}
Slurm-Replay is published under the BSD 3-clause license, see here.
You are very welcome to contribute to Slurm-Replay.
If you want to contribute code, there are a few things to consider:
- a good start is to fork the repository
- use GitHub pull requests to merge your contribution
- consider documenting your code according