dasayan05 / condor

A python interface to HTCondor job scheduler

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Software interface to Condor

This repository contains a software interface library to HTCondor job scheduler. This library allows users to submit jobs to HTCondor system from python script running on local system.

Author: Ayan Das

Requirements:

  1. Python package paramiko must be installed
  2. Must have ssh-ing capability to the Condor login node.

Setup:

  1. Install the git repo as pip package
pip install git+https://github.com/dasayan05/condor.git

OR

  1. Clone this repository anywhere (e.g. <local/path/to/repo>):
git clone https://github.com/dasayan05/condor.git <local/path/to/repo>

.. then put the following on your .bashrc (or whatever shell you use)

export PYTHONPATH=${PYTHONPATH}:<local/path/to/repo>

OR

  1. Build and Install it manually with
cd <local/path/to/repo>
python setup.py install

Usage:

Create a python file <anything>.py and keep it in the root of your own project. The following example snippet shows the basic usage of the library:

import os
from condor import condor, Job, Configuration, Grid

# Provide required configuration of machine
conf = Configuration(universe='docker', # OR 'vanilla'
    # full container tag from DockerHub or 'registry.eps.surrey.ac.uk'
    docker_image='pytorch/pytorch:1.7.0-cuda11.0-cudnn8-runtime',
    # any extra folder to mount in docker; project space will be auto mounted :)
    extra_mounts=['/vol/vssp'],
    request_CPUs=1,
    request_GPUs=1,
    gpu_memory_range=[8000,24000],
    cuda_capability=5.5,
    # following two lists must not overlap
    restricted_machines=['bad.server.com', 'worse.server.com'], # not allowed to run on these
    allowed_machines=['favmachine.server.com'] # can ONLY run on these machines
)

# This is the (example) job to be submitted.
# python classifier.py --base ./ --root ${STORAGE}/datasets/quickdraw --batch_size 64 --n_classes 3 --epochs 5 --modelname clsc3f7g10

with condor('condor', project_space='myProject') as sess:
    # Open a session to condor login node with hostname 'condor'.
    # Set up password-less ssh, otherwise it will ask for password
    # everytime this 'with .. as' block is encountered.
    # Also, provide the name of your projec space folder. It is required.

    # easy grid search with 'Grid', access each variable with '.<name>' .. OR
    # we can de-structure them in-place (make sure the order is same)
    for (batch_size, learning_rate) in Grid(batch_size=[8, 16, 32, 64], lr=[1e-2, 1e-3]): # submit a bunch of jobs

        tag = f'MyAwesomeJob_batch_{bs}'

        # It will autodetect the full path of your python executable
        j = Job('/opt/conda/bin/python', # if docker, use absolute path to specify executables inside container
            'classifier.py',
            # all arguments to the executable should be in the dictionary as follows.
            # an entry 'epochs=30' in the dict will appear as 'python <file>.py --epochs 30'
            arguments=dict(
                base=os.getcwd(),
                root=os.environ['STORAGE'] + '/datasets/quickdraw',
                batch_size=batch_size, # Here's the looped variable 'bs'
                learning_rate=learning_rate,
                n_classes=3,
                epochs=30,
                modelname='clsc3f7g10'
            ),
            # some extra arguments for Job()
            can_checkpoint=True,
            approx_runtime=2, # in hours
            tag=tag, # give a cool name
            # puts all log files inside this directory (will be created if doesn't exists)
            # for job specific directory use job-specific parameters to create the path;
            # otherwise, use a job-agnostic directory e.g. './junk'
            artifact_dir=f'./junk/{tag}'
        )

        # finally submit it
        job_id = sess.submit(j, conf)
        print(f'Submitted Job ID: {job_id}')

NOTE: It is recommended that you set up password-less SSH to your condor login node. You may have to type password way too many times in case you don't.

About

A python interface to HTCondor job scheduler

License:MIT License


Languages

Language:Python 100.0%