Slurm-Replay

Slurm Replay allows replaying traces of job scheduled on HPC system using Slurm. By using the same Slurm configuration and unmodified Slurm code-base used by a production HPC system, one can replay jobs that have been submitted. Slurm-Replay enables the capability to investigate different Slurm configurations or policies and see their impacts on an production workload.

Documentation

For more information, check out the

Citation

There is a paper in the proceedings of the SC 2018. We would appreciate a citation.

@inproceedings{
  author={M. Martinasso and M. Gila and M. Bianco and S. R. Alam and C. McMurtrie and T. C. Schulthess},
  title={{RM-Replay: A High-Fidelity Tuning, Optimization and Exploration Tool for Resource Management}},
  year={2018},
  month={Nov.},
  booktitle={Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC18)},
  location={Dallas, Texas},
  publisher={IEEE Press},
  pages={},
  isbn={},
}

License

Slurm-Replay is published under the BSD 3-clause license, see here.

Contribute

You are very welcome to contribute to Slurm-Replay.

If you want to contribute code, there are a few things to consider:

a good start is to fork the repository
use GitHub pull requests to merge your contribution
consider documenting your code according

TODO list

About

Replay job submissions for Slurm

slurm replay hpc

BSD 3-Clause Clear License

Languages

Language:C 72.6%Language:Shell 12.2%Language:Python 8.7%Language:Dockerfile 4.0%Language:Makefile 2.5%