Nextomics / NextDenovo

Fast and accurate de novo assembler for long reads

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Timing out on minimap-nd tasks

wgallin opened this issue · comments

My assembly job is failing with Time Limit being exceeded during some of the minimap-nd jobs

It appears that when parallel tasks are being run the time allocated to their running is shorter than it time it takes to complete them.

An example log entry for a single job ( it appears that 10 of these have failed out of 100 submitted) is shown here:

Error message
hostname

  • hostname
    cd /scratch/wgallin/NextDeNovo_Test01/Trial_02_Ppen_NextDenovo_Assembly/01.raw_align/03.raw_align.sh.work/raw_align100
  • cd /scratch/wgallin/NextDeNovo_Test01/Trial_02_Ppen_NextDenovo_Assembly/01.raw_align/03.raw_align.sh.work/raw_align100
    ( time /cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/nextdenovo/2.5.2/bin/minimap2-nd --step 1 -I 3G -t 8 -x ava-ont /scratch/wgallin/NextDeNovo_Test01/Trial_02_Ppen_NextDenovo_Assembl
    y/01.raw_align/input.seed.004.2bit /scratch/wgallin/NextDeNovo_Test01/Trial_02_Ppen_NextDenovo_Assembly/01.raw_align/input.seed.004.2bit -o input.seed.004.2bit.99.ovl; )
  • /cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/nextdenovo/2.5.2/bin/minimap2-nd --step 1 -I 3G -t 8 -x ava-ont /scratch/wgallin/NextDeNovo_Test01/Trial_02_Ppen_NextDenovo_Assembly/01.r
    aw_align/input.seed.004.2bit /scratch/wgallin/NextDeNovo_Test01/Trial_02_Ppen_NextDenovo_Assembly/01.raw_align/input.seed.004.2bit -o input.seed.004.2bit.99.ovl
    [M::mm_idx_gen::64.6861.84] collected minimizers
    [M::mm_idx_gen::75.200
    2.64] sorted minimizers
    [M::main::75.2002.64] loaded/built the index for 107322 target sequence(s)
    [M::mm_mapopt_update::77.544
    2.59] mid_occ = 1212
    [M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 107322
    [M::mm_idx_stat::78.6582.57] distinct minimizers: 95367629 (42.05% are singletons); average occurrences: 8.194; average spacing: 2.931
    [M::worker_pipeline::1280.746
    7.56] mapped 25749 sequences
    [M::worker_pipeline::2627.600*7.78] mapped 20748 sequences
    slurmstepd: error: *** JOB 18227135 ON gra1100 CANCELLED AT 2024-03-30T08:38:49 DUE TO TIME LIMIT ***
    Genome characteristics
    genome size, heterozygous rate, repeat content...

Input data This is the relevant part of the slurm.out file

[100999 INFO] 2024-03-30 02:52:07 NextDenovo start...
[100999 INFO] 2024-03-30 02:52:08 version:Unknown logfile:pid100999.log.info
[100999 WARNING] 2024-03-30 02:52:09 Re-write workdir
[100999 INFO] 2024-03-30 02:52:09 mkdir: /scratch/wgallin/NextDeNovo_Test01/Trial_02_Ppen_NextDenovo_Assembly
[100999 INFO] 2024-03-30 02:52:10 mkdir: /scratch/wgallin/NextDeNovo_Test01/Trial_02_Ppen_NextDenovo_Assembly/01.raw_align
[100999 INFO] 2024-03-30 02:52:10 mkdir: /scratch/wgallin/NextDeNovo_Test01/Trial_02_Ppen_NextDenovo_Assembly/02.cns_align
[100999 INFO] 2024-03-30 02:52:10 mkdir: /scratch/wgallin/NextDeNovo_Test01/Trial_02_Ppen_NextDenovo_Assembly/03.ctg_graph
[100999 INFO] 2024-03-30 02:52:18 Total jobs: 1
[100999 INFO] 2024-03-30 02:52:18 Submitted jobID:[18223332] jobCmd:[/scratch/wgallin/NextDeNovo_Test01/Trial_02_Ppen_NextDenovo_Assembly/01.raw_align/01.db_stat.sh.work/db_stat1/Trial02.sh] in the slur
m_cycle.
[100999 INFO] 2024-03-30 02:54:20 db_stat done
[100999 INFO] 2024-03-30 02:54:20 updated options:
rerun: 3
task: all
deltmp: 1
rewrite: 1
read_type: ont
job_type: slurm
input_type: raw
read_cutoff: 1k
pa_correction: 5
seed_cutfiles: 5
parallel_jobs: 32
seed_depth: 38.12
genome_size: 300m
seed_cutoff: 10000
job_prefix: Trial02
blocksize: 983465750
ctg_cns_options: -p 30
nextgraph_options: -a 1
sort_options: -m 50g -t 30 -k 40
minimap2_options_map: -x map-ont
minimap2_options_raw: -t 8 -x ava-ont
input_fofn: /scratch/wgallin/NextDeNovo_Test01/input.fofn
correction_options: -p 30 -max_lq_length 10000 -r ont -min_len_seed 5000
workdir: /scratch/wgallin/NextDeNovo_Test01/Trial_02_Ppen_NextDenovo_Assembly
minimap2_options_cns: -t 8 -x ava-ont -k 17 -w 17 --minlen 1000 --maxhan1 5000
raw_aligndir: /scratch/wgallin/NextDeNovo_Test01/Trial_02_Ppen_NextDenovo_Assembly/01.raw_align
cns_aligndir: /scratch/wgallin/NextDeNovo_Test01/Trial_02_Ppen_NextDenovo_Assembly/02.cns_align
ctg_graphdir: /scratch/wgallin/NextDeNovo_Test01/Trial_02_Ppen_NextDenovo_Assembly/03.ctg_graph
[100999 INFO] 2024-03-30 02:54:20 summary of input data:
file: /scratch/wgallin/NextDeNovo_Test01/Trial_02_Ppen_NextDenovo_Assembly/01.raw_align/input.reads.stat
[Read length stat]
Types Count (#) Length (bp)
N10 49686 39610
N20 138374 24804
N30 277076 15991
N40 488598 10686
N50 795459 7571
N60 1219406 5562
N70 1792624 4116
N80 2576448 2961
N90 3705002 1970

Types Count (#) Bases (bp) Depth (X)
Raw 7575648 28638422273 95.46
Filtered 1971087 1286477110 4.29
Clean 5604561 27351945163 91.17

*Suggested seed_cutoff (genome size: 300.00Mb, expected seed depth: 45, real seed depth: 38.12): 10000 bp

Config file
[General]
job_type = slurm
job_prefix = Trial02
task = all
rewrite = yes
deltmp = yes
parallel_jobs = 32
input_type = raw
read_type = ont # clr, ont, hifi
input_fofn = input.fofn
workdir = Trial_02_Ppen_NextDenovo_Assembly

[correct_option]
read_cutoff = 1k
genome_size = 300m # estimated genome size
sort_options = -m 50g -t 30
minimap2_options_raw = -t 8
pa_correction = 5
correction_options = -p 30

[assemble_option]
minimap2_options_cns = -t 8
nextgraph_options = -a 1

Operating system
LSB Version: n/a
Distributor ID: Gentoo
Description: Gentoo Base System release 2.6
Release: 2.6
Codename: n/a

GCC
gcc version 9.3.0 (GCC)

Python
3.11

NextDenovo
What version of NextDenovo are you using?
2.5.2

Two solutions

  1. It seems that your system limits the running time of a job, so you can reduce blockize and increase seed_cutfiles to reduce the size of each subfile and speed up the map task. But the total runing time maybe will longer.
  2. see here or here to adjust the submit command.

Hi @wgallin . I'm still trying to figure out how to run NextDenovo in a HPC environment using SLURM. Would you be able to share your script.slurm.sh with me?

Hi @wgallin,

Thanks for your response. Let's see if I understand.
So basically, you set up your script.slurm.sh to:

#SBATCH --nodes 1
#SBATCH --ntasks 1
#SBATCH --cpus-per-task 32
#SBATCH --mem 256G
#SBATCH --time 7-00:00:00

# MODULES
module load nextdenovo

# MAIN
nextDenovo run.cfg

And your run.cfg to use local, one parallel job and -t / -p to 32:

[General]
job_type = local
parallel_jobs = 1

[correct_option]
sort_options = -t 32
minimap2_options_raw = -t 32
pa_correction = 3
correction_options = -p 32

[assemble_option]
minimap2_options_cns = -t 32

Could you please verify this?
I appreciate your help,
Dani.