Nextomics / NextDenovo

Fast and accurate de novo assembler for long reads

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[ERROR] failed to find seq file type or empty file after ctg_graph

etd530 opened this issue · comments

Dear Devs,

I am trying to assemble the genome of a bacterial endosymbiont from PacBio CLR reads generated for its host. We have a previous assembly made with another assembler, but we want to improve it. The expected genome size is ~1.5Mb.

When NextDenovo reaches the ctg_graph job it finishes with the error [ERROR] failed to find seq file type or empty file; the file nd.asm.p.fasta is empty.

Here is the main task log: pid12305.log.txt

The config file is as follows (I also tried with defaults for seed_cutoff and for -k in sort_options):

[General]
job_type = local
job_prefix = erebia_ligea.RO_EL_949.nonBfly.nextdenovo
task = all # 'all', 'correct', 'assemble'
rewrite = yes # yes/no
deltmp = yes
rerun = 0
parallel_jobs = 5
input_type = raw
read_type = clr
input_fofn = /scratch/etoro/erebia/nextdenovo/erebia_ligea.RO_EL_949.nonBfly.nextdenovo.fofn
workdir = /scratch/etoro/erebia/nextdenovo

[correct_option]
read_cutoff = 1k
genome_size = 1500000
pa_correction = 5
sort_options = -m 40g -t 20 -k 80
minimap2_options_raw =  -t 12
correction_options = -p 13
seed_cutoff = 10000

[assemble_option]
minimap2_options_cns =  -t 12
nextgraph_options = -a 1

OS version: Ubuntu 18.04.1 LTS
GCC version: 6.3.0 20170516 (Debian 6.3.0-18+deb9u1)
Python version: 3.7.13
NextDenovo: 2.4.0

Many thanks,

Eric

Could you paste the content of file /scratch/etoro/erebia/nextdenovo/03.ctg_graph/01.ctg_graph.sh.work/ctg_graph0/erebia_ligea.RO_EL_949.nonBfly.nextdenovo.sh.e to here?

Here it is:

hostname
+ hostname
cd /scratch/etoro/erebia/nextdenovo/03.ctg_graph/01.ctg_graph.sh.work/ctg_graph0
+ cd /scratch/etoro/erebia/nextdenovo/03.ctg_graph/01.ctg_graph.sh.work/ctg_graph0
time /ceph/users/amackintosh/software/NextDenovo/bin/nextgraph -a 1 -f /scratch/etoro/erebia/nextdenovo/03.ctg_graph/01.ctg_graph.input.seqs /scratch/etoro/erebia/nextdenovo/03.ctg_graph/01.ctg_graph.input.ovls -o nd.asm.p.fasta;
+ time /ceph/users/amackintosh/software/NextDenovo/bin/nextgraph -a 1 -f /scratch/etoro/erebia/nextdenovo/03.ctg_graph/01.ctg_graph.input.seqs /scratch/etoro/erebia/nextdenovo/03.ctg_graph/01.ctg_graph.input.ovls -o nd.asm.p.fasta
[INFO] 2023-03-06 18:00:08 Initialize graph and reading...
[INFO] 2023-03-06 18:00:09 Initial Node(s): 684, Edge(s): 7574
[INFO] 2023-03-06 18:00:09 Depth stat, Mid: 39.000 Max: 78000.000 Repeat: 58.500 L:N:H: 0.040:0.960:0.000
[INFO] 2023-03-06 18:00:09 Outdegree stat, Mid: 11.000 Max: 22000.000 Repeat: 16.500 L:N:H: 0.052:0.948:0.000
[INFO] 2023-03-06 18:00:09 Chimeric node ratio: 0.073% (candidate: 1.169%)
[INFO] 2023-03-06 18:00:09 Assembly done and outputting...
[INFO] 2023-03-06 18:00:09 CMD:
 /ceph/users/amackintosh/software/NextDenovo/bin/nextgraph -a 1 -f /scratch/etoro/erebia/nextdenovo/03.ctg_graph/01.ctg_graph.input.seqs -o nd.asm.p.fasta /scratch/etoro/erebia/nextdenovo/03.ctg_graph/01.ctg_graph.input.ovls
[INFO] 2023-03-06 18:00:09 Real time: 1.019 sec; CPU: 1.020 sec; Peak RSS: 0.376 GB

0.74user 0.27system 0:01.02elapsed 99%CPU (0avgtext+0avgdata 393792maxresident)k
0inputs+0outputs (0major+99669minor)pagefaults 0swaps
touch /scratch/etoro/erebia/nextdenovo/03.ctg_graph/01.ctg_graph.sh.work/ctg_graph0/erebia_ligea.RO_EL_949.nonBfly.nextdenovo.sh.done
+ touch /scratch/etoro/erebia/nextdenovo/03.ctg_graph/01.ctg_graph.sh.work/ctg_graph0/erebia_ligea.RO_EL_949.nonBfly.nextdenovo.sh.done

Try to reduce -z -l first and if the result is still empty then reduce these options: -q -N -u -w -B -C -z -l -L -t .

After playing a bit with the suggested parameters I managed to obtain an assembly with the expected characteristics (1,492,648 bp; GC%=0.341) and BUSCO C of 97.3%.

In case it helps others, the values I used were -a 1 -z 4 -l 8 -q 0 -N 2 -u 2 -w 3 -B 300 -C 20 -L 5 -t 300.

Thanks for the help!