hsgg / soupstir

soupstir - Simple Orchestra of Unix Process STatistical Information Reapers - See jobs on your cluster

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

soupstir - See jobs on your cluster

'soupstir' is a collection of shell and python scripts to analyse the usage of CPU time on a heterogeneous computer cluster. Other measures have also been added, and more may come.

'soupstir' is short for for Simple Orchestra of Unix Process STatistical Information Reapers.

Getting started

To use it, edit the file "config" in this directory making sure that the 'cfree' command is executable on every host, and pointing to the 'cfree.sh' shell script included in this directory.

The hostlistfile must be a list of all hosts, one per line, as given to the 'slogin' command.

Then you can run

    ./all.sh | tee log | ./statistics.sh

to gather data. It executes '$cfree' on every host, saves it to the file 'log', and then displays statistics such as a histogram of how many jobs each user is running, the number of unreachable hosts, the number of cores, and the number of overcommitted and unniced jobs. The user 'freecores none' is a placeholder for cores where no job is running on, so you can easily check that by running

    cat log | grep "Number of free cores"

where I have reused the log file from before, but, of course, that could also be replaced by './all.sh' to get fresh data.

See also the 'DESIGN' document, especially for troubleshooting.

What you need

The following requirements must be met on every node: - Remote login via slogin (part of ssh) without asking for a password (like using a key) - /bin/sh - hostname - ps - grep - sed - wc - echo So, basically a very, very standard UNIX-like OS.

On the host you want to analyse stuff, you additionally need: - python (probably <3) - slogin (symlink to ssh) - tr - cut - paste - bc - cat - dirname - rm - sort - diff Apart from 'python' and perhaps 'diff' or 'slogin', I'd be very surprised if you don't already have those.

What more?

Please keep in mind that this software is still very early in its development, I can't guarantee for anything.

In particular, the definition of a job may not fit your requirements. A job is defined as a process or thread that lasts more than 59secs walltime, and gets more than 10% CPU time averaged over its entire lifetime. If you have a better way to determine what a job is, please speak up.

Greetings, Henry

PS: If you must, you can email me at hsggebhardt@googlemail.com.

About

soupstir - Simple Orchestra of Unix Process STatistical Information Reapers - See jobs on your cluster

License:GNU General Public License v3.0


Languages

Language:Shell 57.1%Language:Python 42.9%