eth-cscs / slurm-replay

Replay job submissions for Slurm

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Slurm-Replay

Slurm Replay allows replaying traces of job scheduled on HPC system using Slurm. By using the same Slurm configuration and unmodified Slurm code-base used by a production HPC system, one can replay jobs that have been submitted. Slurm-Replay enables the capability to investigate different Slurm configurations or policies and see their impacts on an production workload.

Documentation

For more information, check out the

Citation

There is a paper in the proceedings of the SC 2018. We would appreciate a citation.

@inproceedings{
  author={M. Martinasso and M. Gila and M. Bianco and S. R. Alam and C. McMurtrie and T. C. Schulthess},
  title={{RM-Replay: A High-Fidelity Tuning, Optimization and Exploration Tool for Resource Management}},
  year={2018},
  month={Nov.},
  booktitle={Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC18)},
  location={Dallas, Texas},
  publisher={IEEE Press},
  pages={},
  isbn={},
}

License

Slurm-Replay is published under the BSD 3-clause license, see here.

Contribute

You are very welcome to contribute to Slurm-Replay.

If you want to contribute code, there are a few things to consider:

  • a good start is to fork the repository
  • use GitHub pull requests to merge your contribution
  • consider documenting your code according

TODO list

About

Replay job submissions for Slurm

License:BSD 3-Clause Clear License


Languages

Language:C 72.6%Language:Shell 12.2%Language:Python 8.7%Language:Dockerfile 4.0%Language:Makefile 2.5%