Nextomics / NextDenovo

Fast and accurate de novo assembler for long reads

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

GFA output crashes NexdDenovo even on provided test dataset

jflot opened this issue · comments

Describe the bug
Running the test dataset with -a 3 to output a GFA causes the program to crash.

Error message
[2810717 INFO] 2022-09-27 18:10:29 NextDenovo start...
[2810717 INFO] 2022-09-27 18:10:30 version:v2.5.0 logfile:pid2810717.log.info
[2810717 WARNING] 2022-09-27 18:10:30 Re-write workdir
[2810717 INFO] 2022-09-27 18:10:30 mkdir: /srv/home/jflot/NextDenovo/test_data/./01_rundir
[2810717 INFO] 2022-09-27 18:10:30 mkdir: /srv/home/jflot/NextDenovo/test_data/./01_rundir/01.raw_align
[2810717 INFO] 2022-09-27 18:10:30 mkdir: /srv/home/jflot/NextDenovo/test_data/./01_rundir/02.cns_align
[2810717 INFO] 2022-09-27 18:10:30 mkdir: /srv/home/jflot/NextDenovo/test_data/./01_rundir/03.ctg_graph
[2810717 INFO] 2022-09-27 18:10:35 Total jobs: 1
[2810717 INFO] 2022-09-27 18:10:35 Submitted jobID:[2810722] jobCmd:[/srv/home/jflot/NextDenovo/test_data/01_rundir/01.raw_align/01.db_stat.sh.work/db_stat1/nextDenovo.sh] in the local_cycle.
[2810717 INFO] 2022-09-27 18:10:35 db_stat done
[2810717 INFO] 2022-09-27 18:10:35 updated options:
rerun: 3
task: all
deltmp: 1
rewrite: 1
read_type: clr
job_type: local
input_type: raw
read_cutoff: 1k
parallel_jobs: 2
seed_depth: 45.0
pa_correction: 2
seed_cutfiles: 3
seed_cutoff: 37602
blocksize: 32533112
genome_size: 308161
job_prefix: nextDenovo
ctg_cns_options: -p 15
nextgraph_options: -a 3
sort_options: -m 1g -t 2 -k 40
minimap2_options_map: -x map-pb
minimap2_options_raw: -t 8 -x ava-pb
workdir: /srv/home/jflot/NextDenovo/test_data/./01_rundir
input_fofn: /srv/home/jflot/NextDenovo/test_data/./input.fofn
correction_options: -p 15 -max_lq_length 1000 -min_len_seed 18801
raw_aligndir: /srv/home/jflot/NextDenovo/test_data/./01_rundir/01.raw_align
cns_aligndir: /srv/home/jflot/NextDenovo/test_data/./01_rundir/02.cns_align
ctg_graphdir: /srv/home/jflot/NextDenovo/test_data/./01_rundir/03.ctg_graph
minimap2_options_cns: -t 8 -x ava-pb -k 17 -w 17 --minlen 2000 --maxhan1 5000
[2810717 INFO] 2022-09-27 18:10:35 summary of input data:
file: /srv/home/jflot/NextDenovo/test_data/./01_rundir/01.raw_align/input.reads.stat
[Read length stat]
Types Count (#) Length (bp)
N10 53 55788
N20 123 46432
N30 202 41853
N40 291 37348
N50 388 34790
N60 492 32394
N70 603 30257
N80 723 28202
N90 850 26638

Types Count (#) Bases (bp) Depth (X)
Raw 1000 34891044 113.22
Filtered 3 1724 0.01
Clean 997 34889320 113.22

Suggested seed_cutoff (genome size: 0.31Mb, expected seed depth: 45, real seed depth: 45.00): 37602 bp
[2810717 INFO] 2022-09-27 18:10:40 Total jobs: 1
[2810717 INFO] 2022-09-27 18:10:40 Submitted jobID:[2810734] jobCmd:[/srv/home/jflot/NextDenovo/test_data/01_rundir/01.raw_align/02.db_split.sh.work/db_split1/nextDenovo.sh] in the local_cycle.
[2810717 INFO] 2022-09-27 18:10:41 db_split done
[2810717 INFO] 2022-09-27 18:10:41 Total jobs: 9
[2810717 INFO] 2022-09-27 18:10:41 Submitted jobID:[2810742] jobCmd:[/srv/home/jflot/NextDenovo/test_data/01_rundir/01.raw_align/03.raw_align.sh.work/raw_align1/nextDenovo.sh] in the local_cycle.
[2810717 INFO] 2022-09-27 18:10:41 Submitted jobID:[2810747] jobCmd:[/srv/home/jflot/NextDenovo/test_data/01_rundir/01.raw_align/03.raw_align.sh.work/raw_align2/nextDenovo.sh] in the local_cycle.
[2810717 INFO] 2022-09-27 18:10:43 Submitted jobID:[2810796] jobCmd:[/srv/home/jflot/NextDenovo/test_data/01_rundir/01.raw_align/03.raw_align.sh.work/raw_align3/nextDenovo.sh] in the local_cycle.
[2810717 INFO] 2022-09-27 18:10:44 Submitted jobID:[2810801] jobCmd:[/srv/home/jflot/NextDenovo/test_data/01_rundir/01.raw_align/03.raw_align.sh.work/raw_align4/nextDenovo.sh] in the local_cycle.
[2810717 INFO] 2022-09-27 18:10:45 Submitted jobID:[2810852] jobCmd:[/srv/home/jflot/NextDenovo/test_data/01_rundir/01.raw_align/03.raw_align.sh.work/raw_align5/nextDenovo.sh] in the local_cycle.
[2810717 INFO] 2022-09-27 18:10:45 Submitted jobID:[2810857] jobCmd:[/srv/home/jflot/NextDenovo/test_data/01_rundir/01.raw_align/03.raw_align.sh.work/raw_align6/nextDenovo.sh] in the local_cycle.
[2810717 INFO] 2022-09-27 18:10:47 Submitted jobID:[2810906] jobCmd:[/srv/home/jflot/NextDenovo/test_data/01_rundir/01.raw_align/03.raw_align.sh.work/raw_align7/nextDenovo.sh] in the local_cycle.
[2810717 INFO] 2022-09-27 18:10:48 Submitted jobID:[2810911] jobCmd:[/srv/home/jflot/NextDenovo/test_data/01_rundir/01.raw_align/03.raw_align.sh.work/raw_align8/nextDenovo.sh] in the local_cycle.
[2810717 INFO] 2022-09-27 18:10:49 Submitted jobID:[2810960] jobCmd:[/srv/home/jflot/NextDenovo/test_data/01_rundir/01.raw_align/03.raw_align.sh.work/raw_align9/nextDenovo.sh] in the local_cycle.
[2810717 INFO] 2022-09-27 18:10:51 raw_align done
[2810717 INFO] 2022-09-27 18:10:56 Total jobs: 3
[2810717 INFO] 2022-09-27 18:10:56 Submitted jobID:[2810994] jobCmd:[/srv/home/jflot/NextDenovo/test_data/01_rundir/01.raw_align/04.sort_align.sh.work/sort_align1/nextDenovo.sh] in the local_cycle.
[2810717 INFO] 2022-09-27 18:10:56 Submitted jobID:[2810999] jobCmd:[/srv/home/jflot/NextDenovo/test_data/01_rundir/01.raw_align/04.sort_align.sh.work/sort_align2/nextDenovo.sh] in the local_cycle.
[2810717 INFO] 2022-09-27 18:10:57 Submitted jobID:[2811014] jobCmd:[/srv/home/jflot/NextDenovo/test_data/01_rundir/01.raw_align/04.sort_align.sh.work/sort_align3/nextDenovo.sh] in the local_cycle.
[2810717 INFO] 2022-09-27 18:10:59 sort_align done
[2810717 INFO] 2022-09-27 18:10:59 remove temporary result: /srv/home/jflot/NextDenovo/test_data/01_rundir/01.raw_align/03.raw_align.sh.work/raw_align1/input.seed.002.2bit.0.ovl
[2810717 INFO] 2022-09-27 18:10:59 remove temporary result: /srv/home/jflot/NextDenovo/test_data/01_rundir/01.raw_align/03.raw_align.sh.work/raw_align2/input.seed.002.2bit.1.ovl
[2810717 INFO] 2022-09-27 18:10:59 remove temporary result: /srv/home/jflot/NextDenovo/test_data/01_rundir/01.raw_align/03.raw_align.sh.work/raw_align3/input.seed.002.2bit.2.ovl
[2810717 INFO] 2022-09-27 18:10:59 remove temporary result: /srv/home/jflot/NextDenovo/test_data/01_rundir/01.raw_align/03.raw_align.sh.work/raw_align4/input.seed.002.2bit.3.ovl
[2810717 INFO] 2022-09-27 18:10:59 remove temporary result: /srv/home/jflot/NextDenovo/test_data/01_rundir/01.raw_align/03.raw_align.sh.work/raw_align3/input.seed.001.2bit.2.ovl
[2810717 INFO] 2022-09-27 18:10:59 remove temporary result: /srv/home/jflot/NextDenovo/test_data/01_rundir/01.raw_align/03.raw_align.sh.work/raw_align5/input.seed.001.2bit.4.ovl
[2810717 INFO] 2022-09-27 18:10:59 remove temporary result: /srv/home/jflot/NextDenovo/test_data/01_rundir/01.raw_align/03.raw_align.sh.work/raw_align6/input.seed.001.2bit.5.ovl
[2810717 INFO] 2022-09-27 18:10:59 remove temporary result: /srv/home/jflot/NextDenovo/test_data/01_rundir/01.raw_align/03.raw_align.sh.work/raw_align7/input.seed.001.2bit.6.ovl
[2810717 INFO] 2022-09-27 18:10:59 remove temporary result: /srv/home/jflot/NextDenovo/test_data/01_rundir/01.raw_align/03.raw_align.sh.work/raw_align4/input.seed.003.2bit.3.ovl
[2810717 INFO] 2022-09-27 18:10:59 remove temporary result: /srv/home/jflot/NextDenovo/test_data/01_rundir/01.raw_align/03.raw_align.sh.work/raw_align7/input.seed.003.2bit.6.ovl
[2810717 INFO] 2022-09-27 18:10:59 remove temporary result: /srv/home/jflot/NextDenovo/test_data/01_rundir/01.raw_align/03.raw_align.sh.work/raw_align8/input.seed.003.2bit.7.ovl
[2810717 INFO] 2022-09-27 18:10:59 remove temporary result: /srv/home/jflot/NextDenovo/test_data/01_rundir/01.raw_align/03.raw_align.sh.work/raw_align9/input.seed.003.2bit.8.ovl
[2810717 INFO] 2022-09-27 18:11:04 Total jobs: 3
[2810717 INFO] 2022-09-27 18:11:04 Submitted jobID:[2811030] jobCmd:[/srv/home/jflot/NextDenovo/test_data/01_rundir/02.cns_align/01.seed_cns.sh.work/seed_cns1/nextDenovo.sh] in the local_cycle.
[2810717 INFO] 2022-09-27 18:11:04 Submitted jobID:[2811053] jobCmd:[/srv/home/jflot/NextDenovo/test_data/01_rundir/02.cns_align/01.seed_cns.sh.work/seed_cns2/nextDenovo.sh] in the local_cycle.
[2810717 INFO] 2022-09-27 18:11:12 Submitted jobID:[2811081] jobCmd:[/srv/home/jflot/NextDenovo/test_data/01_rundir/02.cns_align/01.seed_cns.sh.work/seed_cns3/nextDenovo.sh] in the local_cycle.
[2810717 INFO] 2022-09-27 18:11:19 seed_cns done
[2810717 INFO] 2022-09-27 18:11:19 seed_cns finished, and final corrected reads file:
[2810717 INFO] 2022-09-27 18:11:19 /srv/home/jflot/NextDenovo/test_data/./01_rundir/02.cns_align/01.seed_cns.sh.work/seed_cns
/cns.fasta
[2810717 INFO] 2022-09-27 18:11:19 Total jobs: 6
[2810717 INFO] 2022-09-27 18:11:19 Submitted jobID:[2811108] jobCmd:[/srv/home/jflot/NextDenovo/test_data/01_rundir/02.cns_align/02.cns_align.sh.work/cns_align1/nextDenovo.sh] in the local_cycle.
[2810717 INFO] 2022-09-27 18:11:19 Submitted jobID:[2811113] jobCmd:[/srv/home/jflot/NextDenovo/test_data/01_rundir/02.cns_align/02.cns_align.sh.work/cns_align2/nextDenovo.sh] in the local_cycle.
[2810717 INFO] 2022-09-27 18:11:21 Submitted jobID:[2811165] jobCmd:[/srv/home/jflot/NextDenovo/test_data/01_rundir/02.cns_align/02.cns_align.sh.work/cns_align3/nextDenovo.sh] in the local_cycle.
[2810717 INFO] 2022-09-27 18:11:21 Submitted jobID:[2811171] jobCmd:[/srv/home/jflot/NextDenovo/test_data/01_rundir/02.cns_align/02.cns_align.sh.work/cns_align4/nextDenovo.sh] in the local_cycle.
[2810717 INFO] 2022-09-27 18:11:22 Submitted jobID:[2811201] jobCmd:[/srv/home/jflot/NextDenovo/test_data/01_rundir/02.cns_align/02.cns_align.sh.work/cns_align5/nextDenovo.sh] in the local_cycle.
[2810717 INFO] 2022-09-27 18:11:23 Submitted jobID:[2811225] jobCmd:[/srv/home/jflot/NextDenovo/test_data/01_rundir/02.cns_align/02.cns_align.sh.work/cns_align6/nextDenovo.sh] in the local_cycle.
[2810717 INFO] 2022-09-27 18:11:25 cns_align done
[2810717 INFO] 2022-09-27 18:11:30 Total jobs: 1
[2810717 INFO] 2022-09-27 18:11:30 Submitted jobID:[2811280] jobCmd:[/srv/home/jflot/NextDenovo/test_data/01_rundir/03.ctg_graph/01.ctg_graph.sh.work/ctg_graph1/nextDenovo.sh] in the local_cycle.
[2810717 INFO] 2022-09-27 18:11:31 ctg_graph done
[2810717 INFO] 2022-09-27 18:11:36 Total jobs: 3
[2810717 INFO] 2022-09-27 18:11:36 Submitted jobID:[2811449] jobCmd:[/srv/home/jflot/NextDenovo/test_data/01_rundir/03.ctg_graph/02.ctg_align.sh.work/ctg_align1/nextDenovo.sh] in the local_cycle.
[2810717 INFO] 2022-09-27 18:11:37 Submitted jobID:[2811524] jobCmd:[/srv/home/jflot/NextDenovo/test_data/01_rundir/03.ctg_graph/02.ctg_align.sh.work/ctg_align2/nextDenovo.sh] in the local_cycle.
[2810717 INFO] 2022-09-27 18:11:37 Submitted jobID:[2811599] jobCmd:[/srv/home/jflot/NextDenovo/test_data/01_rundir/03.ctg_graph/02.ctg_align.sh.work/ctg_align3/nextDenovo.sh] in the local_cycle.
[2810717 INFO] 2022-09-27 18:11:38 ctg_align done
[2810717 INFO] 2022-09-27 18:11:43 Total jobs: 2
[2810717 INFO] 2022-09-27 18:11:43 Submitted jobID:[2811680] jobCmd:[/srv/home/jflot/NextDenovo/test_data/01_rundir/03.ctg_graph/03.ctg_cns.sh.work/ctg_cns1/nextDenovo.sh] in the local_cycle.
[2810717 INFO] 2022-09-27 18:11:44 Submitted jobID:[2811704] jobCmd:[/srv/home/jflot/NextDenovo/test_data/01_rundir/03.ctg_graph/03.ctg_cns.sh.work/ctg_cns2/nextDenovo.sh] in the local_cycle.
[2810717 INFO] 2022-09-27 18:11:45 ctg_cns done
[2810717 INFO] 2022-09-27 18:11:45 remove temporary result: /srv/home/jflot/NextDenovo/test_data/01_rundir/03.ctg_graph/02.ctg_align.sh.work/ctg_align1/input.seed.002.2bit.sort.bam
[2810717 INFO] 2022-09-27 18:11:45 remove temporary result: /srv/home/jflot/NextDenovo/test_data/01_rundir/03.ctg_graph/02.ctg_align.sh.work/ctg_align2/input.seed.001.2bit.sort.bam
[2810717 INFO] 2022-09-27 18:11:45 remove temporary result: /srv/home/jflot/NextDenovo/test_data/01_rundir/03.ctg_graph/02.ctg_align.sh.work/ctg_align3/input.seed.003.2bit.sort.bam
Traceback (most recent call last):
File "/srv/home/jflot/NextDenovo/./nextDenovo", line 850, in
main(args)
File "/srv/home/jflot/NextDenovo/./nextDenovo", line 821, in main
asm, stat = gather_ctg_cns_output(cfg, task.jobs, seq_info)
File "/srv/home/jflot/NextDenovo/./nextDenovo", line 293, in gather_ctg_cns_output
out = cal_n50_info(stat, asm + '.stat')
File "/srv/home/jflot/NextDenovo/lib/kit.py", line 204, in cal_n50_info
out += "%-5s %18d%20s\n" % ("Min.", stat[-1], '-')
IndexError: list index out of range

Genome characteristics
It is the test dataset included in the NextDenovo download.

Input data
It is the test dataset included in the NextDenovo download.

Config file
Compared with the default file, I only changed the "1" at the very end into a "3".

[General]
job_type = local
job_prefix = nextDenovo
task = all # 'all', 'correct', 'assemble'
rewrite = yes # yes/no
deltmp = yes
rerun = 3
parallel_jobs = 2
input_type = raw
read_type = clr
input_fofn = ./input.fofn
workdir = ./01_rundir

[correct_option]
read_cutoff = 1k
genome_size = 308161
pa_correction = 2
sort_options = -m 1g -t 2
minimap2_options_raw = -t 8
correction_options = -p 15

[assemble_option]
minimap2_options_cns = -t 8
nextgraph_options = -a 3

Operating system
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 20.04.4 LTS
Release: 20.04
Codename: focal

GCC
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/9/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:hsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 9.4.0-1ubuntu120.04.1' --with-bugurl=file:///usr/share/doc/gcc-9/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-9 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-9-Av3uEd/gcc-9-9.4.0/debian/tmp-nvptx/usr,hsa --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1
20.04.1)

Python
Python 3.9.13

NextDenovo
v2.5.0

To Reproduce
wget https://github.com/Nextomics/NextDenovo/releases/latest/download/NextDenovo.tgz
tar -vxzf NextDenovo.tgz && cd NextDenovo
sed 's/nextgraph_options = -a 1/nextgraph_options = -a 3/' test_data/run.cfg > test_data/run_gfa.cfg
./nextDenovo test_data/run_gfa.cfg

This is a known bug because NextDenovo does not output sequences when using -a 3, but the later steps require sequences , so try using -a 1.

Yes, -a 1 works but for my downstream analyses I absolutely need a GFA. How can I generate the GFA from my NextDenovo assembly? Is it possible to run nextgraph as a standalone on the NextDenovo output in order to generate the GFA of the assembly, and if so, could you provide an example command line?

Try to run the command in file: 03.ctg_graph/01.ctg_graph.sh.work/ctg_graph1/nextDenovo.sh to produce a GFA, but unfortunately, this result is slightly different from the final NextDenovo assembly in fasta format. See here.

Thanks. I tried it on a couple of my recent bacterial assemblies but the resulting GFAs do not contain any edge. This is weird, as the genomes I am assembling are circular hence my contigs should be connected in the GFA... Another issue (but this one minor) is that the GFA does not contain the sequences of the contigs. It would be really great (and very useful to many people) if NextDeNovo would output automatically its final assembly in GFA format, as many popular long-read assemblers such as Flye and Raven do - is there any prospect that this will be implemented in the near future?

Hi, thanks for your great suggestion. I will consider adding it in a future version.