ds-04 / slurm_su_bank_python3

Service Unit tracking for SLURM, python3 - based upon slurm_bank

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

slurm_su_bank_python3

UNDER DEVELOPMENT!!! USE AT OWN RISK

A Banking/Resource Allocation (Service Unit) tracking system for the SLURM job scheduler based upon slurm_bank created by Barry Moore (2017).

Developed on python 3.6.8. Eventual plan to update popen for python 3.7.

Table of Contents

  1. Why?
  2. How?
  3. Prerequisites
  4. Accounts and Associations
  5. Setup
    1. User
    2. Vars
    3. Charging
  6. Usage
    1. Operation
    2. Held accounts
    3. Adding and account
  7. Checking (Cron)
  8. Dumping the DB
  9. Useful SLURM commands

Why?

We needed a banking system for SLURM, which is simple and robust - Barry Moore's slurm_bank met this criteria, work was undertaken to build upon it to create this project.

In this version python3 updates have been made and email notifications from the program itself are currently removed.

Why are email notifications removed? We plan to use another (external) system to keep track of project proposal end date and to email upon thresholds.

How?

A Python program is used and data stored in an sqlite file.

Using the existing associations in your SLURM database, we use the RawUsage from sshare to monitor service units (CPU hours) on the cluster. From the documentation:

Raw Usage
The number of cpu-seconds of all the jobs that charged the account by the user.
This number will decay over time when PriorityDecayHalfLife is defined.

PriorityDecayHalfLife
This controls how long prior resource use is considered in determining how
over- or under-serviced an association is (user, bank account and cluster) in
determining job priority. The record of usage will be decayed over time, with
half of the original value cleared at age PriorityDecayHalfLife. If set to 0 no
decay will be applied. This is helpful if you want to enforce hard time limits
per association. If set to 0 PriorityUsageResetPeriod must be set to some
interval.

Therefore, in your Slurm configuration you will need:

PriorityDecayHalfLife=0-00:00:00 #No decay will be applied. This is helpful if you want to enforce hard time limits per association.
PriorityUsageResetPeriod=NONE #Never clear historic usage. The default value.
AccountingStorageEnforce=associations,limits,qos,safe #If you don't set the configuration parameters that begin with "AccountingStorage" then accounting information will not be referenced or recorded

The slurm_bank.py takes care of resetting SLURM'S RawUsage for you upon the account in question. The bank has two limits:

  1. A service unit limit: How many compute hours is an account allowed to use?
    --ENFORCED by default (typically via cron script). This is the primary use case and reason for the program.

  2. A project date limit: How long does the proposal last?
    --NOT ENFORCED (typically via cron script), but capability is present, minus emailing. We plan to manage this elsewhere.

Other:

  • The bank's three month check (check 90 days before project end) is dormant here. Again, we plan to check externally.
  • Upper and lower SU check limits are defined, these don't result in an email but do result in DB value change. Again, we plan to mail externally.

Prerequisites

  • Python3 (tested on 3.6.8). Requirements file for pip3 included.
    • dataset: "databases for lazy people"
    • docopt: "command line arguments parser, that will make you smile"
    • datafreeze: Dump (freeze) SQL query results from a database. As per https://dataset.readthedocs.io/en/latest/api.html datafreeze is a seperate module to dataset - See Data Export section.
    • SMTP: NOT required, we plan to use external mechanism for any notifications
  • sqlite for db_print.sh script
  • SLURM: tested with 19x

Accounts and Associations

In your SLURM configuration is envisaged you will form a tree where multiple users are associated with an account (project) e.g.:

       Account       User  RawShares  NormShares    RawUsage  EffectvUsage  FairShare 
-------------------------------------------------------------------------------------------------
  test1                       parent    0.025000     2686197      0.999999            
    test1            user1    parent    0.025000           0      0.000000   0.545455 
    test1            user2    parent    0.025000     2587994      0.963441   0.545455 
    test1            user3    parent    0.025000       98202      0.036558   0.545455 

Above we see the test1 account has user members user{1..3}. Usage by submitted user jobs on the test1 account will propogate/accumulate, and in this example it'll be test1's SUs in the bank/DB that will be compared to the overall RawUsage stored by SLURM accounting.

In project-centric regime, it is assumed you will provide SUs at the project level of the tree. Your tree may look something like:

Physics - example Department or Organisation or even a sublevel of those e.g. Project category
   |
   test1 - Project (owned by PI) < set SU's against this entity/account
       |
       User1 
       User2
       User3

Setup

User

  • Clone this repo/code on the SLURM master node. e.g. into a new directory, e.g. /etc/slurm_bank
  • Make ownership and user of program the SLURM user (not root!).

Vars

  • py_sb_settings.py is used to set the bank's behaviour and file locations for the python code.
  • env.sh is used primarily to setup vars for slurm_bank_cron.sh cron checks. It also is used by the db_print.sh script.

Charging

In SLURM you will need to setup billing per partition (slurm.conf) e.g. within partition definition:

Example compute:
TRESBillingWeights="CPU=1.0,Mem=0.25G,GRES/gpu=0.0"
Example GPU:
TRESBillingWeights="CPU=1.0,Mem=0.25G,GRES/gpu=1.0"

Here, CPU=1.0 means 1 service unit per hour to use 1 core and GRES/gpu=1.0 means 1 service unit per hour to use 1 GPU card.

Usage

Operation

After setup of py_sb_settings.py and env.sh ...

Typically most operations will take place through slurm_bank_cron.sh cron checks.

slurm_bank.py is used to manage/view SU balances for accounts stored in the DB and to release (account exceeded SUs).

db_print.sh is a simple script that'll quickly tell you what's going on overall by printing the entire DB table. Also consult the cron logs.

Held accounts

An account will be held if RawUsage exceeds the SUs in the bank DB.

If the account is held in SLURM you'll see an entry in the GrpTRESMins column e.g.:

           Account       User  RawShares  NormShares    RawUsage  EffectvUsage  FairShare                    GrpTRESMins 
------------------------------------------------------------------------------------------------------------------------ 
 test1                           parent    0.025000     2686197      0.999999                                     cpu=0 
    test1              user1     parent    0.025000           0      0.000000   0.545455                                
    test1              user2     parent    0.025000     2587994      0.963441   0.545455                                
    test1              user3     parent    0.025000       98202      0.036558   0.545455                                

Adding an account:

To add an account and SUs you simply execute slurm_bank.py e.g.

./slurm_bank.py insert test1 10000

Querying immediately after would look like this:

./slurm_bank.py get_sus test1
Account test1 has 10000 SUs

The resultant DB entry would look like this:

1|test1|10000|2022-04-08|0|0|0

Checking (Cron)

The script slurm_bank_cron.sh will perform a check of Service Units by looping through all SLURM accounts - it is anticipated you'd run this at very least daily. If an account has exhausted it's SUs that account will be held. The mechanism to hold we will use is by setting the account's GrpTRESMins to 0 in SLURM to hold the account. This can be changed in py_sb_settings.py

Dumping the DB

You can dump the DB to JSON and subsequently repopulate it. On repopulating a backup JSON dump is now taken to a fixed path - the path is set in py_sb_settings.py

Additionally you can dump to CSV, but JSON is currently required to repopulate the sqlite DB, which is required for operation of the bank.

Useful SLURM commands

See the tree of accounts and show GrpTRESMins to see if any are held. You may wish to also consider using where account=projZZZZ

sacctmgr show assoc tree -o format=account,user,share,GrpTRESMins

See RawUsage and Share information for accounts. Also show GrpTRESMins.

sshare -a -o Account,User,RawShares,NormShares,RawUsage,EffectvUsage,FairShare,GrpTRESMins

Billing rate for running job

scontrol show job <jobID> | grep -i billing

Billing rate for completed job

sacct -X --format=AllocTRES%80,Elapsed -j <jobID>

Other resources

This tool prints out the Slurm associations limits and current usage values for a user and may be worth including in your deployment:

https://github.com/OleHolmNielsen/Slurm_tools/tree/master/showuserlimits

About

Service Unit tracking for SLURM, python3 - based upon slurm_bank

License:MIT License


Languages

Language:Python 90.6%Language:Shell 9.4%