When you need to run several jobs concurrently, there is frequently a bit of a conundrum.
Some jobs may run very quickly, others may take a long time.
If you split the jobs up equally into different groups, it is easy to run them via some shell scripts.
For instance, you have 8 jobs to run, and you want to run 4 at a time.
Script 1:
- short-job-01.sh
- short-job-02.sh
Script 2:
- long-job-03.sh
- long-job-04.sh
Script 3:
- short-job-05.sh
- long-job-06.sh
Script 4:
- short-job-07.sh
- short-job-08.sh
Scripts 1,3 and 4 may be done long before Script 2 completes.
In fact, the first job in Script 2 may still be running after the others have all finished.
And you will then have to wait on long-job-04 as well.
Using jobrun allows running N jobs concurrently, and starting more jobs as others complete.
Defaults for jobrun.pl
A PID file created when jobrun.pl starts.
The main Perl script
Alternative Bash script
Configure some jobs to run.
A test job.
Logic for controlling semaphores and job creation
Start a session:
$ ./jobrun.pl --maxjobs 1 --nodebug --noverbose
jobrun.pl
parent pid: 1038440
:parent:1038440 main loop
parent:1038440 Number of jobs left to run: 9
parent:1038440 sending job: ./test-job.sh job-2 15
JOB: job-2: ./test-job.sh job-2 15
child:1038440 cmd:./test-job.sh job-2 15
After a bit, you decide to run 9 jobs concurently, and enable --verbose and --debug.
These values are already set in the jobs.conf file, so send HUP to cause the config file to be reloaded and applied.
$ kill -1 $(cat jobrun.pid)
./jobrun.sh -n -i 2 -m 3 -s logs-sh -t jobrun-sh
The following options override the jobrun.conf configuration file
'-n': debug off
'-i 2': set the loop interval to 2 seconds
'-m': max number of concurrent jobs set to 3
'-s logs-sh': the name of the log directory
'-t jobrun-sh': log file base name
Addd '-y' for the dry run option:
```text
$ ./jobrun.sh -n -i 2 -m 3 -s logs-sh -t jobrun-sh $@ -y
interval seconds: 2
############################################################
## getKV jobrun.conf
############################################################
key: logdir val: ./logs
key: logfile val: jobrun-sem
key: verbose val: 1
key: logfile-suffix val: log
key: maxjobs val: 4
key: debug val: 1
key: iteration-seconds val: 9
############################################################
## getKV jobs.conf
############################################################
key: job-10 val: ./test-job.sh job-10 10
key: job-1 val: ./test-job.sh job-1 10
key: job-3 val: ./test-job.sh job-3 10
key: job-2 val: ./test-job.sh job-2 10
key: job-5 val: ./test-job.sh job-5 10
key: job-4 val: ./test-job.sh job-4 10
key: job-7 val: ./test-job.sh job-7 10
key: job-6 val: ./test-job.sh job-6 10
key: job-9 val: ./test-job.sh job-9 10
key: job-8 val: ./test-job.sh job-8 10
logDir: logs-sh
logFileSuffix: log
logFileName: jobrun-sh
intervalSeconds: 2
logFile: logs-sh/jobrun-sh-2024-06-28_14-46-18.log
maxConcurrentJobs: 3
jobrunConfigFile: jobrun.conf
jobsConfigFile: jobs.conf
debug: N
./jobrun.pl -h
jobrun.pl
usage: jobrun.pl
The default values are found in jobrun.conf, and can be changed.
--config-file jobrun config file. default: jobrun.conf
--job-config-file jobs config file. default: jobs.conf
--iteration-seconds seconds between checks to run more jobs. default: 10
--maxjobs number of jobs to run concurrently. default: 9
--logfile logfile basename. default: jobrun-sem
--logfile-suffix logfile suffix. default: log
--verbose print more messages: default: 1 or on
--debug print debug messages: default: 1 or on
--help show this help.
Example:
./jobrun.pl --logfile-suffix=load-log --job-config-file dbjobs.conf --maxjobs 1 --nodebug --noverbose
When jobrun.pl starts, it will create a file 'jobrun.pid' in the current directory.
There are traps on the HUP, INT, TERM and QUIT signals.
Pressing CTL-C will not stop jobrun, but it will print a status message.
Pressing CTL-\ will kill the program and cleanup semaphores
The config file can be reloaded with HUB.
Say you have started jobrun with the --noverbose and --nodebug flags, but would now like to change
that so that more info appears on screen.
The following command will do that:
$ kill -1 $(cat jobrun.pid)
jobrun can also be stopped with QUIT or TERM (see kill -l)
QUIT
$ kill -3 $(cat jobrun.pid)
TERM
$ kill -15 $(cat jobrun.pid)
It may take a few moments for the chilren to die.
The fastest method to stop jobrun is CTL-\
./jobrun.sh
-c resumable
if the script is terminated with -TERM or -INT (CTL-C for instance)
a temporary job configuration file is created for jobs not completed
this file will be used to restart if -c is again used
-i interval seconds - default 10
-j jobs config file - default jobs.conf
-r jobrun config file - default jobrun.conf
-m max concurrent jobs - default 5
-s log directory - default logs
-t log file base name - default jobrun-sh
-u log file suffix - default log
-d debug on - output is to STDERR
-n debug off - overrides config file
-y dry run - read arguments, config file, show variables and exit
-h help
- track job by pid, job name and status
- resumable - persist status and results
(done in bash version)
- option - set a jobrun ID for the batch of jobs
- used to identify file
- or maybe just a name for the results file
- skip jobs that have already run and have status == 1
- option - rerun or not rerun failed jobs - status == 2
- option - set a jobrun ID for the batch of jobs
- check if job running when status == 2
- option - use system metric to throttle number of jobs
- for OS - could be load (bad idea, I know, just an example)
- for Oracle - check AAS - allow up to N jobs to run where N == Cores/2
- code read from a config file - should return an integer
- in main config - manually set a threshold value for chosen metric
- results
- some kind of reporting