slurmstepd: error: * JOB CANCELLED DUE TO TIME LIMIT *

Question

slurmstepd: error: * JOB CANCELLED DUE TO TIME LIMIT *

dpaudel opened this issue 3 years ago · comments

Describe the bug
I am running NextDenovo on slurm system. The job runs for around 10 minutes and gets cancelled by throwing the following error:

Error message

[ERROR] 2021-05-11 12:06:27,882 db_stat failed: please check the following logs:
[ERROR] 2021-05-11 12:06:27,910 bberry/14-nextdenovo/01_rundir/01.raw_align/01.db_stat.sh.work/dDenovo.sh.e

~/bberry/14-nextdenovo/01_rundir/01.raw_align/01.db_stat.sh.work/dDenovo.sh.e
hostname
+ hostname
cd /bberry/14-nextdenovo/01_rundir/01.raw_align/01.db_stat.sh.work/db_stat0
+ cd /orange/zdeng/dev.paudel/bberry/14-nextdenovo/01_rundir/01.raw_align/01.db_stat.sh.work/db_stat0
time /apps/nextdenovo/2.4.0/bin/seq_stat -f 3k -g 1g -d 45 -o /bberry/14-nextdenovo/01_rundir/01.put.reads.stat //bberry/14-nextdenovo/input.fofn
+ /apps/nextdenovo/2.4.0/bin/seq_stat -f 3k -g 1g -d 45 -o /bberry/14-nextdenovo/01_rundir/01.raw.reads.stat /bberry/14-nextdenovo/input.fofn
slurmstepd: error: *** JOB 867816 ON c0700a-s1 CANCELLED AT 2021-05-11T12:06:17 DUE TO TIME LIMIT ***

Genome characteristics
1g, repeat ~ 35%

Input data
160x nanopore data

Config file
[General]
job_type = slurm # local, slurm, sge, pbs, lsf
job_prefix = nextDenovo
task = all # all, correct, assemble
rewrite = yes # yes/no
deltmp = yes
parallel_jobs = 40 # number of tasks used to run in parallel
input_type = raw # raw, corrected
read_type = ont # clr, ont, hifi
input_fofn = input.fofn
workdir = 01_rundir

[correct_option]
read_cutoff = 3k
genome_size = 1g # estimated genome size
sort_options = -m 20g -t 15
minimap2_options_raw = -t 8
pa_correction = 3 # number of corrected tasks used to run in parallel, each corrected task requires ~TOTAL_INPUT_BASES/4
bytes of memory usage.
correction_options = -p 15

[assemble_option]
minimap2_options_cns = -t 8
nextgraph_options = -a 1

Operating system
LSB Version: :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: RedHatEnterpriseServer
Description: Red Hat Enterprise Linux Server release 7.7 (Maipo)
Release: 7.7
Codename: Maipo

GCC
gcc version 4.8.5 20150623 (Red Hat 4.8.5-39) (GCC)

Python
python/3.8

NextDenovo
nextdenovo/2.4.0

Additional context (Optional)
slurm-drmaa/1.2.1.20
Is there a -time option that can be included so that slurm job is submitted with the given time limit?

Hu Jiang · Answer 1 · Wed May 12 2021 09:46:21 GMT+0800 (China Standard Time)

Hi, see #48 or, if the slurm system (I do not have a slurm system, so I can not have a try) has a time limited option, you can try to set cluster_options=--cpus-per-task={cpu} --mem-per-cpu={vf} time_limited_option.

slurmstepd: error: *** JOB CANCELLED DUE TO TIME LIMIT ***

slurmstepd: error: * JOB CANCELLED DUE TO TIME LIMIT *