Nextomics / NextDenovo

Fast and accurate de novo assembler for long reads

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

no ctg_cns.sh was generated

jurin0811 opened this issue · comments

Question or Expected behavior
A clear and concise description of your question or what you expected to happen.
Step 3 03.ctg_graph is not generated with 03.ctg_cns.sh. Error display:[52387 INFO] 2022-10-19 09:28:55 ctg_align done
Traceback (most recent call last):
File "/share/home/cau_zhul/software/NextDenovo/nextDenovo", line 850, in
main(args)
File "/share/home/cau_zhul/software/NextDenovo/nextDenovo", line 798, in main
seq_info = blc_genome(cfg['pa_correction'], ctg_graph_output)
File "/share/home/cau_zhul/software/NextDenovo/nextDenovo", line 235, in blc_genome
seq_len = int(lines[2].split(':')[-1])
ValueError: invalid literal for int() with base 10: '0.996602'
Operating system
Which operating system and version are you using?
You can use the command lsb_release -a to get it.
centOS 7.9.2009
GCC
What version of GCC are you using?
You can use the command gcc -v to get it.
4.8.5
Python
What version of Python are you using?
You can use the command python --version to get it.
3.7.13
NextDenovo
What version of NextDenovo are you using?
You can use the command nextDenovo -v to get it.
2.5.0
Additional context (Optional)
Add any other context about the problem here.
Hi, doctor Hu! This is my problem:
Step 3 03.ctg_graph is not generated with 03.ctg_cns.sh.
And error was "[64391 INFO] 2022-10-19 07:54:43 ctg_graph done

/ share/home/cau_zhul /. Cbsched / 1663575030.409927. Shell: line 10: 64391 Segmentation fault (core dumped) ~ / software/NextDenovo/NextDenovo run_copy_1.cfg". Later I used the method in "segmentation fault after ctg_graph was done #153",

"Hi,

Acutally, you don't have to rerun the whole process, just see here to continue running unfinished tasks.
For the segmentation falut, I guess this is caused by the calgs function in the file lib/kit.py, so you can replace this function with the following python code:

def calgs(infile):

from Bio import SeqIO

gs = 0

for seq_record in SeqIO.parse(infile, "fasta"):

gs += len(seq_record.seq)

Return gs"

Then I stoped in Step 2, 02.ctg_align.sh.done, and error message
"
[52387 INFO] 2022-10-19 09:28:55 ctg_align done is displayed

Traceback (most recent call last):

File "/share/home/cau_zhul/software/NextDenovo/nextDenovo", line 850, in < module>

main(args)

File "/share/home/cau_zhul/software/NextDenovo/nextDenovo", line 798, in main

seq_info = blc_genome(cfg['pa_correction'], ctg_graph_output)

File "/share/home/cau_zhul/software/NextDenovo/nextDenovo", line 235, in blc_genome

seq_len = int(lines[2].split(':')[-1])

ValueError: invalid literal for int() with base 10: '0.996602'"

After checking, I found that no "nd.asm.p.fasta.blc" was generated.

Thank you!

what is your config file?

[General]
job_type = local
job_prefix = nextDenovo
task = all # 'all', 'correct', 'assemble'
rewrite = yes # yes/no
deltmp = yes
rerun = 3
parallel_jobs = 10
input_type = raw
read_type = ont
input_fofn = /share/home/cau_zhul/software/NextDenovo/input_copy_2.fofn
workdir = /share/home/cau_zhul/04by815assembly/01_rundir

[correct_option]
read_cutoff = 21k
seed_cutoff = 59k
genome_size = 2200000000
pa_correction = 10
sort_options = -m 40g -t 6
minimap2_options_raw = -t 6
correction_options = -p 6

[assemble_option]
minimap2_options_cns = -x ava-ont -t 6
nextgraph_options = -a 1

It seems your config file is ok, so could you paste the complete content of log file and 03.ctg_graph/01.ctg_graph.sh.work/ctg_graph1/nextDenovo.sh.e to here?

NextDenovo.sh.e:
hostname

  • hostname
    cd /share/home/cau_zhul/04by815assembly/01_rundir/03.ctg_graph/01.ctg_graph.sh.work/ctg_graph1
  • cd /share/home/cau_zhul/04by815assembly/01_rundir/03.ctg_graph/01.ctg_graph.sh.work/ctg_graph1
    time /share/home/cau_zhul/software/NextDenovo/bin/nextgraph -a 1 -f /share/home/cau_zhul/04by815assembly/01_rundir/03.ctg_graph/01.ctg_graph.input.seqs /share/home/cau_zhul/04by815assembly/01_rundir/03.ctg_graph/01.ctg_graph.input.ovls -o nd.asm.p.fasta;
  • /share/home/cau_zhul/software/NextDenovo/bin/nextgraph -a 1 -f /share/home/cau_zhul/04by815assembly/01_rundir/03.ctg_graph/01.ctg_graph.input.seqs /share/home/cau_zhul/04by815assembly/01_rundir/03.ctg_graph/01.ctg_graph.input.ovls -o nd.asm.p.fasta
    [INFO] 2022-10-19 10:46:37 Initialize graph and reading...
    [INFO] 2022-10-19 10:47:34 Initial Node(s): 107710, Edge(s): 614092
    [INFO] 2022-10-19 10:47:34 Depth stat, Mid: 50.000 Max: 100000.000 Repeat: 75.000 L:N:H: 0.028:0.612:0.361
    [INFO] 2022-10-19 10:47:34 Outdegree stat, Mid: 5.000 Max: 10000.000 Repeat: 7.500 L:N:H: 0.157:0.843:0.000
    [INFO] 2022-10-19 10:47:35 Chimeric node ratio: 1.454% (candidate: 2.042%)
    [INFO] 2022-10-19 10:47:36 Assembly done and outputting...
    [INFO] 2022-10-19 10:57:36 Assembly stat:
    Type Length (bp) Count (#)
    N10 112739159 2
    N20 108795139 4
    N30 104975967 6
    N40 79317148 9
    N50 69817248 12
    N60 57138202 15
    N70 52928542 19
    N80 32419638 24
    N90 23389277 31

Min. 181947 -
Max. 112900535 -
Ave. 23723564 -
Total 2135120821 90
[INFO] 2022-10-19 10:57:36 CMD:
/share/home/cau_zhul/software/NextDenovo/bin/nextgraph -a 1 -f /share/home/cau_zhul/04by815assembly/01_rundir/03.ctg_graph/01.ctg_graph.input.seqs -o nd.asm.p.fasta /share/home/cau_zhul/04by815assembly/01_rundir/03.ctg_graph/01.ctg_graph.input.ovls
[INFO] 2022-10-19 10:57:36 Real time: 658.937 sec; CPU: 82.452 sec; Peak RSS: 4.146 GB

real 10m58.995s
user 1m8.480s
sys 0m13.974s
touch /share/home/cau_zhul/04by815assembly/01_rundir/03.ctg_graph/01.ctg_graph.sh.work/ctg_graph1/nextDenovo.sh.done

  • touch /share/home/cau_zhul/04by815assembly/01_rundir/03.ctg_graph/01.ctg_graph.sh.work/ctg_graph1/nextDenovo.sh.done

Nextdenovo.sh
#!/bin/sh
set -xve
hostname
cd /share/home/cau_zhul/04by815assembly/01_rundir/03.ctg_graph/01.ctg_graph.sh.work/ctg_graph1
time /share/home/cau_zhul/software/NextDenovo/bin/nextgraph -a 1 -f /share/home/cau_zhul/04by815assembly/01_rundir/03.ctg_graph/01.ctg_graph.input.seqs /share/home/cau_zhul/04by815assembly/01_rundir/03.ctg_graph/01.ctg_graph.input.ovls -o nd.asm.p.fasta;
touch /share/home/cau_zhul/04by815assembly/01_rundir/03.ctg_graph/01.ctg_graph.sh.work/ctg_graph1/nextDenovo.sh.done

How about the main task log? The main task log is usually located in your working directory and is named pidXXX.log.info

[8423 INFO] 2022-10-19 10:46:29 NextDenovo start...
[8423 INFO] 2022-10-19 10:46:30 version:v2.5.0 logfile:pid8423.log.info
[8423 WARNING] 2022-10-19 10:46:30 Re-write workdir
[8423 INFO] 2022-10-19 10:46:30 skip mkdir: /share/home/cau_zhul/04by815assembly/01_rundir
[8423 INFO] 2022-10-19 10:46:30 skip mkdir: /share/home/cau_zhul/04by815assembly/01_rundir/01.raw_align
[8423 INFO] 2022-10-19 10:46:30 skip mkdir: /share/home/cau_zhul/04by815assembly/01_rundir/02.cns_align
[8423 INFO] 2022-10-19 10:46:30 skip mkdir: /share/home/cau_zhul/04by815assembly/01_rundir/03.ctg_graph
[8423 INFO] 2022-10-19 10:46:30 options:
rerun: 3
task: all
deltmp: 1
rewrite: 1
seed_depth: 45
blocksize: 10g
read_type: ont
job_type: local
input_type: raw
read_cutoff: 21k
parallel_jobs: 10
pa_correction: 10
seed_cutfiles: 10
seed_cutoff: 59000
job_prefix: nextDenovo
ctg_cns_options: -p 6
nextgraph_options: -a 1
genome_size: 2200000000
sort_options: -m 40g -t 6
minimap2_options_map: -x map-ont
minimap2_options_raw: -t 6 -x ava-ont
workdir: /share/home/cau_zhul/04by815assembly/01_rundir
correction_options: -p 6 -min_len_seed 29500 -max_lq_length 10000
input_fofn: /share/home/cau_zhul/software/NextDenovo/input_copy_2.fofn
raw_aligndir: /share/home/cau_zhul/04by815assembly/01_rundir/01.raw_align
cns_aligndir: /share/home/cau_zhul/04by815assembly/01_rundir/02.cns_align
ctg_graphdir: /share/home/cau_zhul/04by815assembly/01_rundir/03.ctg_graph
minimap2_options_cns: -x ava-ont -t 6 -k 17 -w 17 --minlen 2000 --maxhan1 5000
[8423 INFO] 2022-10-19 10:46:30 skip step: db_split
[8423 INFO] 2022-10-19 10:46:30 skip step: raw_align
[8423 INFO] 2022-10-19 10:46:31 skip step: sort_align
[8423 INFO] 2022-10-19 10:46:31 skip step: seed_cns
[8423 INFO] 2022-10-19 10:46:31 seed_cns finished, and final corrected reads file:
[8423 INFO] 2022-10-19 10:46:31 �[35m /share/home/cau_zhul/04by815assembly/01_rundir/02.cns_align/01.seed_cns.sh.work/seed_cns*/cns.fasta �[0m
[8423 INFO] 2022-10-19 10:46:31 skip step: cns_align
[8423 INFO] 2022-10-19 10:46:37 Total jobs: 1
[8423 INFO] 2022-10-19 10:46:37 Submitted jobID:[8550] jobCmd:[/share/home/cau_zhul/04by815assembly/01_rundir/03.ctg_graph/01.ctg_graph.sh.work/ctg_graph1/nextDenovo.sh] in the local_cycle.
[8423 INFO] 2022-10-19 10:57:37 ctg_graph done
[8423 INFO] 2022-10-19 10:58:06 Total jobs: 0
[8423 INFO] 2022-10-19 10:58:06 ctg_align done

I think you can try to remove rm -rf 01_rundir/03.ctg_graph and then rerun.

I removed it and rerun. But the error is the same as before

I can't solve it with this limited information unless I can log into your system to debug it. So, maybe you need to try other assemblers.