Nextomics / NextDenovo

Fast and accurate de novo assembler for long reads

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

hifi reads assembly

kibzhulab opened this issue · comments

hello,
I ran into this problem when I assembled the genome from hifi data and couldn't solve it,could you give me some advise, my input reads is hifi.fasta

Error message
[346137 INFO] 2022-03-19 11:12:05 NextDenovo start...
[346137 INFO] 2022-03-19 11:12:06 version:v2.5.0 logfile:pid346137.log.info
[346137 WARNING] 2022-03-19 11:12:06 Re-write workdir
[346137 WARNING] 2022-03-19 11:12:06 Change task "all" to "assemble", becasue the input_type is "corrected"
[346137 INFO] 2022-03-19 11:12:06 skip mkdir: /mnt/data/Lixl/workspace/genome/02.assembly/nextdenovo/./01_rundir
[346137 INFO] 2022-03-19 11:12:06 skip mkdir: /mnt/data/Lixl/workspace/genome/02.assembly/nextdenovo/./01_rundir/01.raw_align
[346137 INFO] 2022-03-19 11:12:06 skip mkdir: /mnt/data/Lixl/workspace/genome/02.assembly/nextdenovo/./01_rundir/02.cns_align
[346137 INFO] 2022-03-19 11:12:06 skip mkdir: /mnt/data/Lixl/workspace/genome/02.assembly/nextdenovo/./01_rundir/03.ctg_graph
[346137 INFO] 2022-03-19 11:12:06 skip step: db_stat
[346137 INFO] 2022-03-19 11:12:06 updated options:
rerun: 3
deltmp: 1
rewrite: 1
task: assemble
job_type: local
read_cutoff: 1k
read_type: hifi
parallel_jobs: 2
seed_depth: 40.0
pa_correction: 2
seed_cutfiles: 3
seed_cutoff: 36191
genome_size: 800000
input_type: corrected
blocksize: 5195036747
job_prefix: nextDenovo
ctg_cns_options: -sp -p 15
sort_options: -m 2g -t 2 -k 40
nextgraph_options: -a 1 -R 0.7
minimap2_options_map: -x asm20
minimap2_options_raw: -t 8 -x ava-hifi
correction_options: -p 15 -max_lq_length 10000
minimap2_options_cns: -t 8 -x ava-hifi --minide 0.1 --maxhan1 1000 -f 800
workdir: /mnt/data/Lixl/workspace/genome/02.assembly/nextdenovo/./01_rundir
input_fofn: /mnt/data/Lixl/workspace/genome/02.assembly/nextdenovo/./input.fofn
raw_aligndir: /mnt/data/Lixl/workspace/genome/02.assembly/nextdenovo/./01_rundir/01.raw_align
cns_aligndir: /mnt/data/Lixl/workspace/genome/02.assembly/nextdenovo/./01_rundir/02.cns_align
ctg_graphdir: /mnt/data/Lixl/workspace/genome/02.assembly/nextdenovo/./01_rundir/03.ctg_graph
[346137 INFO] 2022-03-19 11:12:06 summary of input data:
file: /mnt/data/Lixl/workspace/genome/02.assembly/nextdenovo/./01_rundir/01.raw_align/input.reads.stat
[Read length stat]
Types Count (#) Length (bp)
N10 11376 27275
N20 25067 24102
N30 40309 21892
N40 56974 20124
N50 75054 18577
N60 94620 17186
N70 115799 15839
N80 138883 14445
N90 164462 12859

Types Count (#) Bases (bp) Depth (X)
Raw 194359 3494691165 4368.36
Filtered 0 0 0.00
Clean 194359 3494691165 4368.36

*Suggested seed_cutoff (genome size: 0.80Mb, expected seed depth: 40, real seed depth: 40.00): 36191 bp
[346137 INFO] 2022-03-19 11:12:06 skip step: split_seed
[346137 INFO] 2022-03-19 11:12:06 skip step: cns_align
[346137 INFO] 2022-03-19 11:12:06 skip step: ctg_graph
[346137 INFO] 2022-03-19 11:12:11 Total jobs: 3
[346137 INFO] 2022-03-19 11:12:11 Submitted jobID:[346439] jobCmd:[/mnt/data/Lixl/workspace/genome/02.assembly/nextdenovo/01_rundir/03.ctg_graph/02.ctg_align.sh.work/ctg_align1/nextDenovo.sh] in the local_cycle.
[346137 INFO] 2022-03-19 11:12:12 Submitted jobID:[346519] jobCmd:[/mnt/data/Lixl/workspace/genome/02.assembly/nextdenovo/01_rundir/03.ctg_graph/02.ctg_align.sh.work/ctg_align2/nextDenovo.sh] in the local_cycle.
[346137 INFO] 2022-03-19 11:12:12 Submitted jobID:[346561] jobCmd:[/mnt/data/Lixl/workspace/genome/02.assembly/nextdenovo/01_rundir/03.ctg_graph/02.ctg_align.sh.work/ctg_align3/nextDenovo.sh] in the local_cycle.
[346137 INFO] 2022-03-19 11:12:13 ctg_align done
[346137 INFO] 2022-03-19 11:12:18 Total jobs: 2
[346137 INFO] 2022-03-19 11:12:18 Submitted jobID:[346857] jobCmd:[/mnt/data/Lixl/workspace/genome/02.assembly/nextdenovo/01_rundir/03.ctg_graph/03.ctg_cns.sh.work/ctg_cns1/nextDenovo.sh] in the local_cycle.
[346137 INFO] 2022-03-19 11:12:19 Submitted jobID:[346893] jobCmd:[/mnt/data/Lixl/workspace/genome/02.assembly/nextdenovo/01_rundir/03.ctg_graph/03.ctg_cns.sh.work/ctg_cns2/nextDenovo.sh] in the local_cycle.
[346137 INFO] 2022-03-19 11:12:20 ctg_cns done
[346137 INFO] 2022-03-19 11:12:20 remove temporary result: /mnt/data/Lixl/workspace/genome/02.assembly/nextdenovo/01_rundir/03.ctg_graph/02.ctg_align.sh.work/ctg_align1/cns0.fasta.sort.bam
[346137 INFO] 2022-03-19 11:12:20 remove temporary result: /mnt/data/Lixl/workspace/genome/02.assembly/nextdenovo/01_rundir/03.ctg_graph/02.ctg_align.sh.work/ctg_align2/cns2.fasta.sort.bam
[346137 INFO] 2022-03-19 11:12:20 remove temporary result: /mnt/data/Lixl/workspace/genome/02.assembly/nextdenovo/01_rundir/03.ctg_graph/02.ctg_align.sh.work/ctg_align3/cns1.fasta.sort.bam
Traceback (most recent call last):
File "/mnt/data/Lixl/software/NextDenovo/nextDenovo", line 850, in
main(args)
File "/mnt/data/Lixl/software/NextDenovo/nextDenovo", line 821, in main
asm, stat = gather_ctg_cns_output(cfg, task.jobs, seq_info)
File "/mnt/data/Lixl/software/NextDenovo/nextDenovo", line 293, in gather_ctg_cns_output
out = cal_n50_info(stat, asm + '.stat')
File "/mnt/data/Lixl/software/NextDenovo/lib/kit.py", line 204, in cal_n50_info
out += "%-5s %18d%20s\n" % ("Min.", stat[-1], '-')
IndexError: list index out of range

Genome characteristics
this is a mtgenome and genomesize is around 800k

Input data
Total base count,3.3G hifi fasta reads
sequencing depth 50x
average/N50 read length...`

My configuration file run.cfg is as follows:

[General]
job_type = local
job_prefix = nextDenovo
task = all # 'all', 'correct', 'assemble'
rewrite = yes # yes/no
deltmp = yes
rerun = 3
parallel_jobs = 2
input_type = raw
read_type = hifi
input_fofn = ./input.fofn
workdir = ./01_rundir

[correct_option]
read_cutoff = 1k
genome_size = 800000
pa_correction = 2
sort_options = -m 2g -t 2
minimap2_options_raw = -t 8
correction_options = -p 15

[assemble_option]
minimap2_options_cns = -t 8
nextgraph_options = -a 1

GCC
gcc version 4.8.5 20150623 (Red Hat 4.8.5-39) (GCC)

Python
Python 3.9.7

NextDenovo
nextDenovo v2.5.0

It seems nextDenovo with default options can not assemble some contigs for your input data, so change some nextgraph parameters or try other assembly tools.