fasrc / slurm-diamond-collector

A collection of diamond collectors for slurm.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

slurm-diamond-collectors

A collection of custom diamond collectors to gather various slurm stats.

Description

These collectors are intended to be used with diamond to ship stats to graphite. Each collector collects data on a different aspect of slurm. Feel free to add or update these collectors to suit your needs.

SlurmSchedStatsCollector

This collector is a diamond version of this:

http://giovannitorres.me/graphing-sdiag-with-graphite.html

This collector will collect sdiag stats allowing you to chart your scheduler performance over time.

SlurmSshareCollector

This collector grabs the current sshare data for users. This assumes that you are using a two tier simple fairshare system of accounts and users of those accounts.

SlurmClusterStatusCollector

This collector pulls the current state of all the nodes in the cluster and then computes overall stats of the cluster such as number of nodes down, number of nodes in use, etc.

SlurmJobLeaderBoard

This collector pulls in the current job information for the last hour. It then summarizes the data per user to be plugged into a leaderboard for the top users.

SlurmJobWaste

This collector pulls in the current job information for the last hour. It then calculates how many TRES-seconds have been wasted by a job, that meaning how much memory and CPU was not actually used by the job though it was allocated by the scheduler. It then publishes a summary of how much TRES was not used by the user.

Usage

Simply add them to /usr/share/diamond/collectors and then activate them in diamond and you should be good to go.

About

A collection of diamond collectors for slurm.


Languages

Language:Python 100.0%