Nextomics / NextDenovo

Fast and accurate de novo assembler for long reads

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

nextgraph_options adjusting caused "IndexError: list index out of range"

DaiWeiKIB opened this issue · comments

Describe the bug
For some reason i need chage the outformt into gfa instead of fasta ,so i adjust my nextgraph_options into nextgraph_options = -a 3
,and come out with error , it makes me to test this argument with test_data, the output come as follows

Error message
[2327055 INFO] 2022-03-21 16:28:46 NextDenovo start...
[2327055 INFO] 2022-03-21 16:28:47 version:v2.5.0 logfile:pid2327055.log.info
[2327055 WARNING] 2022-03-21 16:28:47 Re-write workdir
[2327055 INFO] 2022-03-21 16:28:47 mkdir: /home/daiwei/NextDenovo/test_data/./01_rundir
[2327055 INFO] 2022-03-21 16:28:47 mkdir: /home/daiwei/NextDenovo/test_data/./01_rundir/01.raw_align
[2327055 INFO] 2022-03-21 16:28:47 mkdir: /home/daiwei/NextDenovo/test_data/./01_rundir/02.cns_align
[2327055 INFO] 2022-03-21 16:28:47 mkdir: /home/daiwei/NextDenovo/test_data/./01_rundir/03.ctg_graph
[2327055 INFO] 2022-03-21 16:28:52 Total jobs: 1
[2327055 INFO] 2022-03-21 16:28:52 Submitted jobID:[2327128] jobCmd:[/home/daiwei/NextDenovo/test_data/01_rundir/01.raw_align/01.db_stat.sh.work/db_stat1/nextDenovo.sh] in the local_cycle.
[2327055 INFO] 2022-03-21 16:28:52 db_stat done
[2327055 INFO] 2022-03-21 16:28:52 updated options:
rerun: 3
task: all
deltmp: 1
rewrite: 1
read_type: clr
job_type: local
input_type: raw
read_cutoff: 1k
parallel_jobs: 2
seed_depth: 45.0
pa_correction: 2
seed_cutfiles: 3
seed_cutoff: 37602
blocksize: 32533112
genome_size: 308161
job_prefix: nextDenovo
ctg_cns_options: -p 15
nextgraph_options: -a 3
sort_options: -m 1g -t 2 -k 40
minimap2_options_map: -x map-pb
minimap2_options_raw: -t 8 -x ava-pb
workdir: /home/daiwei/NextDenovo/test_data/./01_rundir
input_fofn: /home/daiwei/NextDenovo/test_data/./input.fofn
correction_options: -p 15 -max_lq_length 1000 -min_len_seed 18801
raw_aligndir: /home/daiwei/NextDenovo/test_data/./01_rundir/01.raw_align
cns_aligndir: /home/daiwei/NextDenovo/test_data/./01_rundir/02.cns_align
ctg_graphdir: /home/daiwei/NextDenovo/test_data/./01_rundir/03.ctg_graph
minimap2_options_cns: -t 8 -x ava-pb -k 17 -w 17 --minlen 2000 --maxhan1 5000
[2327055 INFO] 2022-03-21 16:28:52 summary of input data:
file: /home/daiwei/NextDenovo/test_data/./01_rundir/01.raw_align/input.reads.stat
[Read length stat]
Types Count (#) Length (bp)
N10 53 55788
N20 123 46432
N30 202 41853
N40 291 37348
N50 388 34790
N60 492 32394
N70 603 30257
N80 723 28202
N90 850 26638

Types Count (#) Bases (bp) Depth (X)
Raw 1000 34891044 113.22
Filtered 3 1724 0.01
Clean 997 34889320 113.22

Suggested seed_cutoff (genome size: 0.31Mb, expected seed depth: 45, real seed depth: 45.00): 37602 bp
[2327055 INFO] 2022-03-21 16:28:57 Total jobs: 1
[2327055 INFO] 2022-03-21 16:28:58 Submitted jobID:[2327171] jobCmd:[/home/daiwei/NextDenovo/test_data/01_rundir/01.raw_align/02.db_split.sh.work/db_split1/nextDenovo.sh] in the local_cycle.
[2327055 INFO] 2022-03-21 16:28:58 db_split done
[2327055 INFO] 2022-03-21 16:28:59 Total jobs: 9
[2327055 INFO] 2022-03-21 16:28:59 Submitted jobID:[2327192] jobCmd:[/home/daiwei/NextDenovo/test_data/01_rundir/01.raw_align/03.raw_align.sh.work/raw_align1/nextDenovo.sh] in the local_cycle.
[2327055 INFO] 2022-03-21 16:28:59 Submitted jobID:[2327198] jobCmd:[/home/daiwei/NextDenovo/test_data/01_rundir/01.raw_align/03.raw_align.sh.work/raw_align2/nextDenovo.sh] in the local_cycle.
[2327055 INFO] 2022-03-21 16:29:04 Submitted jobID:[2327276] jobCmd:[/home/daiwei/NextDenovo/test_data/01_rundir/01.raw_align/03.raw_align.sh.work/raw_align3/nextDenovo.sh] in the local_cycle.
[2327055 INFO] 2022-03-21 16:29:05 Submitted jobID:[2327288] jobCmd:[/home/daiwei/NextDenovo/test_data/01_rundir/01.raw_align/03.raw_align.sh.work/raw_align4/nextDenovo.sh] in the local_cycle.
[2327055 INFO] 2022-03-21 16:29:09 Submitted jobID:[2327343] jobCmd:[/home/daiwei/NextDenovo/test_data/01_rundir/01.raw_align/03.raw_align.sh.work/raw_align5/nextDenovo.sh] in the local_cycle.
[2327055 INFO] 2022-03-21 16:29:10 Submitted jobID:[2327374] jobCmd:[/home/daiwei/NextDenovo/test_data/01_rundir/01.raw_align/03.raw_align.sh.work/raw_align6/nextDenovo.sh] in the local_cycle.
[2327055 INFO] 2022-03-21 16:29:15 Submitted jobID:[2327453] jobCmd:[/home/daiwei/NextDenovo/test_data/01_rundir/01.raw_align/03.raw_align.sh.work/raw_align7/nextDenovo.sh] in the local_cycle.
[2327055 INFO] 2022-03-21 16:29:16 Submitted jobID:[2327458] jobCmd:[/home/daiwei/NextDenovo/test_data/01_rundir/01.raw_align/03.raw_align.sh.work/raw_align8/nextDenovo.sh] in the local_cycle.
[2327055 INFO] 2022-03-21 16:29:20 Submitted jobID:[2327541] jobCmd:[/home/daiwei/NextDenovo/test_data/01_rundir/01.raw_align/03.raw_align.sh.work/raw_align9/nextDenovo.sh] in the local_cycle.
[2327055 INFO] 2022-03-21 16:29:25 raw_align done
[2327055 INFO] 2022-03-21 16:29:30 Total jobs: 3
[2327055 INFO] 2022-03-21 16:29:30 Submitted jobID:[2327624] jobCmd:[/home/daiwei/NextDenovo/test_data/01_rundir/01.raw_align/04.sort_align.sh.work/sort_align1/nextDenovo.sh] in the local_cycle.
[2327055 INFO] 2022-03-21 16:29:31 Submitted jobID:[2327635] jobCmd:[/home/daiwei/NextDenovo/test_data/01_rundir/01.raw_align/04.sort_align.sh.work/sort_align2/nextDenovo.sh] in the local_cycle.
[2327055 INFO] 2022-03-21 16:29:35 Submitted jobID:[2327670] jobCmd:[/home/daiwei/NextDenovo/test_data/01_rundir/01.raw_align/04.sort_align.sh.work/sort_align3/nextDenovo.sh] in the local_cycle.
[2327055 INFO] 2022-03-21 16:29:39 sort_align done
[2327055 INFO] 2022-03-21 16:29:39 remove temporary result: /home/daiwei/NextDenovo/test_data/01_rundir/01.raw_align/03.raw_align.sh.work/raw_align1/input.seed.001.2bit.0.ovl
[2327055 INFO] 2022-03-21 16:29:39 remove temporary result: /home/daiwei/NextDenovo/test_data/01_rundir/01.raw_align/03.raw_align.sh.work/raw_align2/input.seed.001.2bit.1.ovl
[2327055 INFO] 2022-03-21 16:29:39 remove temporary result: /home/daiwei/NextDenovo/test_data/01_rundir/01.raw_align/03.raw_align.sh.work/raw_align3/input.seed.001.2bit.2.ovl
[2327055 INFO] 2022-03-21 16:29:39 remove temporary result: /home/daiwei/NextDenovo/test_data/01_rundir/01.raw_align/03.raw_align.sh.work/raw_align4/input.seed.001.2bit.3.ovl
[2327055 INFO] 2022-03-21 16:29:39 remove temporary result: /home/daiwei/NextDenovo/test_data/01_rundir/01.raw_align/03.raw_align.sh.work/raw_align3/input.seed.002.2bit.2.ovl
[2327055 INFO] 2022-03-21 16:29:39 remove temporary result: /home/daiwei/NextDenovo/test_data/01_rundir/01.raw_align/03.raw_align.sh.work/raw_align5/input.seed.002.2bit.4.ovl
[2327055 INFO] 2022-03-21 16:29:39 remove temporary result: /home/daiwei/NextDenovo/test_data/01_rundir/01.raw_align/03.raw_align.sh.work/raw_align6/input.seed.002.2bit.5.ovl
[2327055 INFO] 2022-03-21 16:29:39 remove temporary result: /home/daiwei/NextDenovo/test_data/01_rundir/01.raw_align/03.raw_align.sh.work/raw_align7/input.seed.002.2bit.6.ovl
[2327055 INFO] 2022-03-21 16:29:39 remove temporary result: /home/daiwei/NextDenovo/test_data/01_rundir/01.raw_align/03.raw_align.sh.work/raw_align4/input.seed.003.2bit.3.ovl
[2327055 INFO] 2022-03-21 16:29:39 remove temporary result: /home/daiwei/NextDenovo/test_data/01_rundir/01.raw_align/03.raw_align.sh.work/raw_align7/input.seed.003.2bit.6.ovl
[2327055 INFO] 2022-03-21 16:29:39 remove temporary result: /home/daiwei/NextDenovo/test_data/01_rundir/01.raw_align/03.raw_align.sh.work/raw_align8/input.seed.003.2bit.7.ovl
[2327055 INFO] 2022-03-21 16:29:39 remove temporary result: /home/daiwei/NextDenovo/test_data/01_rundir/01.raw_align/03.raw_align.sh.work/raw_align9/input.seed.003.2bit.8.ovl
[2327055 INFO] 2022-03-21 16:29:44 Total jobs: 3
[2327055 INFO] 2022-03-21 16:29:44 Submitted jobID:[2327738] jobCmd:[/home/daiwei/NextDenovo/test_data/01_rundir/02.cns_align/01.seed_cns.sh.work/seed_cns1/nextDenovo.sh] in the local_cycle.
[2327055 INFO] 2022-03-21 16:29:45 Submitted jobID:[2327761] jobCmd:[/home/daiwei/NextDenovo/test_data/01_rundir/02.cns_align/01.seed_cns.sh.work/seed_cns2/nextDenovo.sh] in the local_cycle.
[2327055 INFO] 2022-03-21 16:30:03 Submitted jobID:[2327887] jobCmd:[/home/daiwei/NextDenovo/test_data/01_rundir/02.cns_align/01.seed_cns.sh.work/seed_cns3/nextDenovo.sh] in the local_cycle.
[2327055 INFO] 2022-03-21 16:30:19 seed_cns done
[2327055 INFO] 2022-03-21 16:30:19 seed_cns finished, and final corrected reads file:
[2327055 INFO] 2022-03-21 16:30:19 /home/daiwei/NextDenovo/test_data/./01_rundir/02.cns_align/01.seed_cns.sh.work/seed_cns
/cns.fasta
[2327055 INFO] 2022-03-21 16:30:19 Total jobs: 6
[2327055 INFO] 2022-03-21 16:30:19 Submitted jobID:[2328012] jobCmd:[/home/daiwei/NextDenovo/test_data/01_rundir/02.cns_align/02.cns_align.sh.work/cns_align1/nextDenovo.sh] in the local_cycle.
[2327055 INFO] 2022-03-21 16:30:20 Submitted jobID:[2328018] jobCmd:[/home/daiwei/NextDenovo/test_data/01_rundir/02.cns_align/02.cns_align.sh.work/cns_align2/nextDenovo.sh] in the local_cycle.
[2327055 INFO] 2022-03-21 16:30:24 Submitted jobID:[2328090] jobCmd:[/home/daiwei/NextDenovo/test_data/01_rundir/02.cns_align/02.cns_align.sh.work/cns_align3/nextDenovo.sh] in the local_cycle.
[2327055 INFO] 2022-03-21 16:30:24 Submitted jobID:[2328102] jobCmd:[/home/daiwei/NextDenovo/test_data/01_rundir/02.cns_align/02.cns_align.sh.work/cns_align4/nextDenovo.sh] in the local_cycle.
[2327055 INFO] 2022-03-21 16:30:29 Submitted jobID:[2328175] jobCmd:[/home/daiwei/NextDenovo/test_data/01_rundir/02.cns_align/02.cns_align.sh.work/cns_align5/nextDenovo.sh] in the local_cycle.
[2327055 INFO] 2022-03-21 16:30:30 Submitted jobID:[2328186] jobCmd:[/home/daiwei/NextDenovo/test_data/01_rundir/02.cns_align/02.cns_align.sh.work/cns_align6/nextDenovo.sh] in the local_cycle.
[2327055 INFO] 2022-03-21 16:30:35 cns_align done
[2327055 INFO] 2022-03-21 16:30:40 Total jobs: 1
[2327055 INFO] 2022-03-21 16:30:40 Submitted jobID:[2328292] jobCmd:[/home/daiwei/NextDenovo/test_data/01_rundir/03.ctg_graph/01.ctg_graph.sh.work/ctg_graph1/nextDenovo.sh] in the local_cycle.
[2327055 INFO] 2022-03-21 16:30:42 ctg_graph done
[2327055 INFO] 2022-03-21 16:30:47 Total jobs: 3
[2327055 INFO] 2022-03-21 16:30:47 Submitted jobID:[2328357] jobCmd:[/home/daiwei/NextDenovo/test_data/01_rundir/03.ctg_graph/02.ctg_align.sh.work/ctg_align1/nextDenovo.sh] in the local_cycle.
[2327055 INFO] 2022-03-21 16:30:47 Submitted jobID:[2328438] jobCmd:[/home/daiwei/NextDenovo/test_data/01_rundir/03.ctg_graph/02.ctg_align.sh.work/ctg_align2/nextDenovo.sh] in the local_cycle.
[2327055 INFO] 2022-03-21 16:30:48 Submitted jobID:[2328513] jobCmd:[/home/daiwei/NextDenovo/test_data/01_rundir/03.ctg_graph/02.ctg_align.sh.work/ctg_align3/nextDenovo.sh] in the local_cycle.
[2327055 INFO] 2022-03-21 16:30:49 ctg_align done
[2327055 INFO] 2022-03-21 16:30:54 Total jobs: 2
[2327055 INFO] 2022-03-21 16:30:54 Submitted jobID:[2328632] jobCmd:[/home/daiwei/NextDenovo/test_data/01_rundir/03.ctg_graph/03.ctg_cns.sh.work/ctg_cns1/nextDenovo.sh] in the local_cycle.
[2327055 INFO] 2022-03-21 16:30:54 Submitted jobID:[2328662] jobCmd:[/home/daiwei/NextDenovo/test_data/01_rundir/03.ctg_graph/03.ctg_cns.sh.work/ctg_cns2/nextDenovo.sh] in the local_cycle.
[2327055 INFO] 2022-03-21 16:30:55 ctg_cns done
[2327055 INFO] 2022-03-21 16:30:55 remove temporary result: /home/daiwei/NextDenovo/test_data/01_rundir/03.ctg_graph/02.ctg_align.sh.work/ctg_align1/input.seed.001.2bit.sort.bam
[2327055 INFO] 2022-03-21 16:30:55 remove temporary result: /home/daiwei/NextDenovo/test_data/01_rundir/03.ctg_graph/02.ctg_align.sh.work/ctg_align2/input.seed.002.2bit.sort.bam
[2327055 INFO] 2022-03-21 16:30:55 remove temporary result: /home/daiwei/NextDenovo/test_data/01_rundir/03.ctg_graph/02.ctg_align.sh.work/ctg_align3/input.seed.003.2bit.sort.bam
Traceback (most recent call last):
File "/home/daiwei/NextDenovo/nextDenovo", line 850, in
main(args)
File "/home/daiwei/NextDenovo/nextDenovo", line 821, in main
asm, stat = gather_ctg_cns_output(cfg, task.jobs, seq_info)
File "/home/daiwei/NextDenovo/nextDenovo", line 293, in gather_ctg_cns_output
out = cal_n50_info(stat, asm + '.stat')
File "/home/daiwei/NextDenovo/lib/kit.py", line 204, in cal_n50_info
out += "%-5s %18d%20s\n" % ("Min.", stat[-1], '-')
IndexError: list index out of range

Genome characteristics
test_data

Input data
test_data

Config file
[General]
job_type = local
job_prefix = nextDenovo
task = all # 'all', 'correct', 'assemble'
rewrite = yes # yes/no
deltmp = yes
rerun = 3
parallel_jobs = 2
input_type = raw
read_type = clr
input_fofn = ./input.fofn
workdir = ./01_rundir

[correct_option]
read_cutoff = 1k
genome_size = 308161
pa_correction = 2
sort_options = -m 1g -t 2
minimap2_options_raw = -t 8
correction_options = -p 15

[assemble_option]
minimap2_options_cns = -t 8
nextgraph_options = -a 3

GCC
8.50gcc version 8.5.0 20210514

Python
2.7/3.9 both error

NextDenovo
nextDenovo v2.5.0

This is an expected behavior, as -a 3 does not output fasta, but NextDenovo requires fasta for doing some assembly stat. -a 3 is for developers only, so it is assumed that users can handle this error.
Just ignore it and the gfa you can get from /home/daiwei/NextDenovo/test_data/01_rundir/03.ctg_graph/01.ctg_graph.sh.work/ctg_graph1/.

Thank you Dr.Hu for your promptly reply.

I‘m afraid my problem still remain

When nextgraph_options = -a 3 , it shows like

Traceback (most recent call last):
File "/home/daiwei/NextDenovo/nextDenovo", line 850, in
main(args)
File "/home/daiwei/NextDenovo/nextDenovo", line 821, in main
asm, stat = gather_ctg_cns_output(cfg, task.jobs, seq_info)
File "/home/daiwei/NextDenovo/nextDenovo", line 293, in gather_ctg_cns_output
out = cal_n50_info(stat, asm + '.stat')
File "/home/daiwei/NextDenovo/lib/kit.py", line 204, in cal_n50_info
out += "%-5s %18d%20s\n" % ("Min.", stat[-1], '-')
IndexError: list index out of range

This error casue the nextDenovo can't carry on with any further processing.

And i'm pretty sure When nextgraph_options = -a 1that i can't find gfa in
/NextDenovo/test_data/01_rundir/03.ctg_graph/01.ctg_graph.sh.work/ctg_graph1/
Neither nextgraph_options = -a 3 nor nextgraph_options = -a 1
my folder content are as shown in this figure

image

I would like to consult about such a situation. Thank you sincerely @moold .

dear,
I am assembly Saccharomyces HiFi data from SRR18210286 and it also crashes with this error
Could it be that I am assembling from too much data (300x depth)?

Thank for your help
Stephane

Traceback (most recent call last):
  File "/opt/biotools/bin/nextDenovo", line 850, in <module>
    main(args)
  File "/opt/biotools/bin/nextDenovo", line 821, in main
    asm, stat = gather_ctg_cns_output(cfg, task.jobs, seq_info)
  File "/opt/biotools/bin/nextDenovo", line 293, in gather_ctg_cns_output
    out = cal_n50_info(stat, asm + '.stat')
  File "/opt/biotools/NextDenovo/lib/kit.py", line 204, in cal_n50_info
    out += "%-5s %18d%20s\n" % ("Min.", stat[-1], '-')
IndexError: list index out of range

my config is

[General]
job_type = local 
job_prefix = nextDenovo
task = all
rewrite = yes 
deltmp = yes
rerun = 3 
parallel_jobs = 30 
input_type = raw
read_type = hifi
input_fofn = ./input.fofn
workdir = ./nextnovo_SRR18210286
[correct_option]
read_cutoff = 2k
genome_size = 12m 
sort_options = -m 20g -t 30
minimap2_options_raw = -t 16
pa_correction = 6 
correction_options = -p 30
[assemble_option]
minimap2_options_cns = -t 16 
nextgraph_options = -a 1

I attach my runlog

pid819296.log.info.txt

dear, I am assembly Saccharomyces HiFi data from SRR18210286 and it also crashes with this error Could it be that I am assembling from too much data (300x depth)?

Thank for your help Stephane

Traceback (most recent call last):
  File "/opt/biotools/bin/nextDenovo", line 850, in <module>
    main(args)
  File "/opt/biotools/bin/nextDenovo", line 821, in main
    asm, stat = gather_ctg_cns_output(cfg, task.jobs, seq_info)
  File "/opt/biotools/bin/nextDenovo", line 293, in gather_ctg_cns_output
    out = cal_n50_info(stat, asm + '.stat')
  File "/opt/biotools/NextDenovo/lib/kit.py", line 204, in cal_n50_info
    out += "%-5s %18d%20s\n" % ("Min.", stat[-1], '-')
IndexError: list index out of range

my config is

[General]
job_type = local 
job_prefix = nextDenovo
task = all
rewrite = yes 
deltmp = yes
rerun = 3 
parallel_jobs = 30 
input_type = raw
read_type = hifi
input_fofn = ./input.fofn
workdir = ./nextnovo_SRR18210286
[correct_option]
read_cutoff = 2k
genome_size = 12m 
sort_options = -m 20g -t 30
minimap2_options_raw = -t 16
pa_correction = 6 
correction_options = -p 30
[assemble_option]
minimap2_options_cns = -t 16 
nextgraph_options = -a 1

I attach my runlog

pid819296.log.info.txt

seems fixed after converting my input from fastq.gz to fasta

Dear friend

This error still remain unsolved. Author Dr.Hu don't anwser my question but closed my iuuse.You mgiht contiue your question on my new issue #143 , it might little help to remind the author to answer.

Dai Wei

Dear @DaiWeiKIB
For me the solution was to convert my input read data from fastQ to fastA
After that it worked.
S