read_type: hifi; File ".../nextDenovo", line 856, in <module> main(args) ... IndexError: list index out of range
Ural-Yunusbaev opened this issue · comments
Describe the bug
I run on nextDenovo in SLURM in 1 node using 60 cores & 70G RAM
sbatch --nodes=1 --ntasks=1 --cpus-per-task=60 --mem=70G ./_NextDenovo2.4_slurm.sh
When I Run nextDenovo using test.ecoli.HiFi.fastq with
read_type: hifi
it reports: File ".../nextDenovo", line 856, in main(args) ... IndexError: list index out of range
Meantime when I run other types of reads of the same organism with
read_type: clr OR ont
it goes smoothly
I tried bac, insect, plant genomes with HiFi reads and had the same error.
Meantime when I run the same organisms with CLR or ONT reads it goes smoothly.
Error message
log message
nextDenovo /scratch/ural/ecolHiFi/run.cfg [INFO] 2021-07-04 17:41:09,106 NextDenovo start... [INFO] 2021-07-04 17:41:09,239 version:v2.4.0 logfile:pid99685.log.info [WARNING] 2021-07-04 17:41:09,240 Re-write workdir [INFO] 2021-07-04 17:41:09,242 mkdir: /scratch/ural/ecolHiFi/01_rundir [INFO] 2021-07-04 17:41:09,243 mkdir: /scratch/ural/ecolHiFi/01_rundir/01.raw_align [INFO] 2021-07-04 17:41:09,244 mkdir: /scratch/ural/ecolHiFi/01_rundir/02.cns_align [INFO] 2021-07-04 17:41:09,245 mkdir: /scratch/ural/ecolHiFi/01_rundir/03.ctg_graph [INFO] 2021-07-04 17:41:14,259 Total jobs: 1 [INFO] 2021-07-04 17:41:14,260 Submit jobID:[99688] jobCmd:[/scratch/ural/ecolHiFi/01_rundir/01.raw_align/01.db_stat.sh.work/db_stat0/nextDenovo.sh] in the local_cycle. [INFO] 2021-07-04 17:41:18,116 db_stat done [INFO] 2021-07-04 17:41:18,119 updated options: rerun: 3 deltmp: 1 rewrite: 1 task: assemble job_type: local read_cutoff: 1k read_type: hifi parallel_jobs: 2 seed_depth: 40.0 pa_correction: 2 seed_cutfiles: 3 genome_size: 4.8m seed_cutoff: 16242 input_type: corrected blocksize: 1797250571 job_prefix: nextDenovo ctg_cns_options: -sp -p 10 nextgraph_options: -a 1 -R 0.7 minimap2_options_map: -x asm20 sort_options: -m 40g -t 8 -k 40 -k 40 minimap2_options_raw: -t 8 -x ava-hifi workdir: /scratch/ural/ecolHiFi/01_rundir input_fofn: /scratch/ural/ecolHiFi/input.fofn correction_options: -p 10 -max_lq_length 10000 raw_aligndir: /scratch/ural/ecolHiFi/01_rundir/01.raw_align cns_aligndir: /scratch/ural/ecolHiFi/01_rundir/02.cns_align ctg_graphdir: /scratch/ural/ecolHiFi/01_rundir/03.ctg_graph minimap2_options_cns: -t 60 -x ava-hifi --minide 0.1 --maxhan1 1000 -f 800 [INFO] 2021-07-04 17:41:18,119 summary of input data: file:�[35m /scratch/ural/ecolHiFi/01_rundir/01.raw_align/input.reads.stat �[0m [Read length stat] Types Count (#) Length (bp) N10 8025 16603 N20 16625 15795 N30 25586 15240 N40 34844 14792 N50 44359 14418 N60 54112 14089 N70 64079 13800 N80 74245 13543 N90 84605 13276 Types Count (#) Bases (bp) Depth (X) Raw 95514 1389500381 289.48 Filtered 0 0 0.00 Clean 95514 1389500381 289.48 *Suggested seed_cutoff (genome size: 4.80Mb, expected seed depth: 40, real seed depth: 40.00): 16242 bp [INFO] 2021-07-04 17:41:23,130 Total jobs: 1 [INFO] 2021-07-04 17:41:23,131 Submit jobID:[99697] jobCmd:[/scratch/ural/ecolHiFi/01_rundir/02.cns_align/01.split_seed.sh.work/split_seed0/nextDenovo.sh] in the local_cycle. [INFO] 2021-07-04 17:41:29,498 split_seed done [INFO] 2021-07-04 17:41:29,510 Total jobs: 6 [INFO] 2021-07-04 17:41:29,511 Submit jobID:[99708] jobCmd:[/scratch/ural/ecolHiFi/01_rundir/02.cns_align/02.cns_align.sh.work/cns_align0/nextDenovo.sh] in the local_cycle. [INFO] 2021-07-04 17:41:30,013 Submit jobID:[99713] jobCmd:[/scratch/ural/ecolHiFi/01_rundir/02.cns_align/02.cns_align.sh.work/cns_align1/nextDenovo.sh] in the local_cycle. [INFO] 2021-07-04 17:41:33,431 Submit jobID:[99976] jobCmd:[/scratch/ural/ecolHiFi/01_rundir/02.cns_align/02.cns_align.sh.work/cns_align2/nextDenovo.sh] in the local_cycle. [INFO] 2021-07-04 17:41:33,989 Submit jobID:[99984] jobCmd:[/scratch/ural/ecolHiFi/01_rundir/02.cns_align/02.cns_align.sh.work/cns_align3/nextDenovo.sh] in the local_cycle. [INFO] 2021-07-04 17:41:37,261 Submit jobID:[100188] jobCmd:[/scratch/ural/ecolHiFi/01_rundir/02.cns_align/02.cns_align.sh.work/cns_align4/nextDenovo.sh] in the local_cycle. [INFO] 2021-07-04 17:41:37,847 Submit jobID:[100254] jobCmd:[/scratch/ural/ecolHiFi/01_rundir/02.cns_align/02.cns_align.sh.work/cns_align5/nextDenovo.sh] in the local_cycle. [INFO] 2021-07-04 17:41:41,608 cns_align done [INFO] 2021-07-04 17:41:46,619 Total jobs: 1 [INFO] 2021-07-04 17:41:46,620 Submit jobID:[100515] jobCmd:[/scratch/ural/ecolHiFi/01_rundir/03.ctg_graph/01.ctg_graph.sh.work/ctg_graph0/nextDenovo.sh] in the local_cycle. [INFO] 2021-07-04 17:41:47,627 ctg_graph done [INFO] 2021-07-04 17:41:52,639 Total jobs: 3 [INFO] 2021-07-04 17:41:52,640 Submit jobID:[100548] jobCmd:[/scratch/ural/ecolHiFi/01_rundir/03.ctg_graph/02.ctg_align.sh.work/ctg_align0/nextDenovo.sh] in the local_cycle. [INFO] 2021-07-04 17:41:53,141 Submit jobID:[100580] jobCmd:[/scratch/ural/ecolHiFi/01_rundir/03.ctg_graph/02.ctg_align.sh.work/ctg_align1/nextDenovo.sh] in the local_cycle. [INFO] 2021-07-04 17:41:53,643 Submit jobID:[100612] jobCmd:[/scratch/ural/ecolHiFi/01_rundir/03.ctg_graph/02.ctg_align.sh.work/ctg_align2/nextDenovo.sh] in the local_cycle. [INFO] 2021-07-04 17:41:54,651 ctg_align done [INFO] 2021-07-04 17:41:59,666 Total jobs: 2 [INFO] 2021-07-04 17:41:59,667 Submit jobID:[100645] jobCmd:[/scratch/ural/ecolHiFi/01_rundir/03.ctg_graph/03.ctg_cns.sh.work/ctg_cns0/nextDenovo.sh] in the local_cycle. [INFO] 2021-07-04 17:42:00,169 Submit jobID:[100664] jobCmd:[/scratch/ural/ecolHiFi/01_rundir/03.ctg_graph/03.ctg_cns.sh.work/ctg_cns1/nextDenovo.sh] in the local_cycle. [INFO] 2021-07-04 17:42:01,177 ctg_cns done [INFO] 2021-07-04 17:42:01,178 remove temporary result: /scratch/ural/ecolHiFi/01_rundir/03.ctg_graph/02.ctg_align.sh.work/ctg_align0/cns2.fasta.sort.bam [INFO] 2021-07-04 17:42:01,180 remove temporary result: /scratch/ural/ecolHiFi/01_rundir/03.ctg_graph/02.ctg_align.sh.work/ctg_align1/cns0.fasta.sort.bam [INFO] 2021-07-04 17:42:01,181 remove temporary result: /scratch/ural/ecolHiFi/01_rundir/03.ctg_graph/02.ctg_align.sh.work/ctg_align2/cns1.fasta.sort.bam Traceback (most recent call last): File "/homes/ural/soft/NextDenovo2.4/NextDenovo/nextDenovo", line 856, in main(args) File "/homes/ural/soft/NextDenovo2.4/NextDenovo/nextDenovo", line 827, in main asm, stat = gather_ctg_cns_output(cfg, task.subtasks, seq_info) File "/homes/ural/soft/NextDenovo2.4/NextDenovo/nextDenovo", line 291, in gather_ctg_cns_output out = cal_n50_info(stat, asm + '.stat') File "/homes/ural/soft/NextDenovo2.4/NextDenovo/lib/kit.py", line 171, in cal_n50_info out += "%-5s %18d%20s\n" % ("Min.", stat[-1], '-') IndexError: list index out of range
Genome characteristics
ecoli 4.8m
Input data
ecoli 4.8m from https://sra-pub-src-1.s3.amazonaws.com/SRR10971019/m54316_180808_005743.fastq.1
Config file
[General]
job_type = local # local, slurm, sge, pbs, lsf
job_prefix = nextDenovo
task = assemble # all, correct, assemble
rewrite = yes # yes/no
deltmp = yes
parallel_jobs = 2 # number of tasks used to run in parallel
input_type = raw # raw, corrected
read_type = hifi # clr, ont, hifi
input_fofn = input.fofn
workdir = 01_rundir
[correct_option]
read_cutoff = 1k
genome_size = 4.8m # estimated genome size
[assemble_option]
minimap2_options_cns = -t 60
nextgraph_options = -a 1 # -q, min short branch len for output, 0=disable, set 5-16 to adjust the assembly size [0]
.
Operating system
Which operating system and version are you using?
You can use the command lsb_release -a
to get it.
lsb_release -a
bash: lsb_release: command not found...
GCC
What version of GCC are you using?
You can use the command gcc -v
to get it.
gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto --enable-plugin --enable-initfini-array --disable-libgcj --with-isl=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/isl-install --with-cloog=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/cloog-install --enable-gnu-indirect-function --with-tune=generic --with-arch_32=x86-64 --build=x86_64-redhat-linux
Thread model: posix
gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC)
Python
What version of Python are you using?
You can use the command python --version
to get it.
python3 --version
Python 3.6.8
head -1 /homes/ural/soft/NextDenovo2.4/NextDenovo/nextDenovo
#!/usr/bin/env python3
NextDenovo
What version of NextDenovo are you using?
You can use the command nextDenovo -v
to get it.
nextDenovo v2.4.0
To Reproduce (Optional)
Steps to reproduce the behavior. Providing a minimal test dataset on which we can reproduce the behavior will generally lead to quicker turnaround time!
Additional context (Optional)
Add any other context about the problem here.
Hi, The current version has bug that can not parse fastq file correctly for HiFi or corrected data, so you can transform the fastq file to fasta file.
Thanks!
Hi, I've encountered similar issue.
nextDenovo worked with raw Nanopore reads, but when changing the input file into the corrected reads I got a similar error message. I've converted the fastq file to fasta file, but it still not worked.
Here is the error message:
File "/usr/local/bin/nextDenovo", line 856, in
main(args)
File "/usr/local/bin/nextDenovo", line 609, in main
reset_cfg(cfg)
File "/usr/local/bin/nextDenovo", line 530, in reset_cfg
tcfg.update(int(g.group(1)), int(g.group(3)), float(g.group(2)))
File "/mnt/data1/bioinfo/NextDenovo/lib/config_parser.py", line 36, in update
gs = parse_num_unit(self.cfg['genome_size'])
File "/mnt/data1/bioinfo/NextDenovo/lib/kit.py", line 120, in parse_num_unit
value = float(contents[0][:-2])
ValueError: could not convert string to float: 'au'
It seems the genome_size you set is not correct, so could you paste your config file to here?
Hi,
You are right! I accidentally set my genome_size to auto, and it's fixed after changing to the estimated genome size.
Thanks for your help!
Best,
Tzu-Haw
Hi @Ural-Yunusbaev,
I'm still trying to figure out how to run NextDenovo in a HPC environment using SLURM.
Would you be able to share your NextDenovo2.4_slurm.sh
and run.cfg
with me?