Nextomics / NextDenovo

Fast and accurate de novo assembler for long reads

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[blocksize] and [pa_correction] in run.cfg

dexon9109 opened this issue · comments

Hi, I have some problems below:

I put a parameter, such as “blocksize=1g” and "pa_correction=200" in run.cfg 。 But these two parameters are not in effect.The actual parameters in effect are "pa_correction=8"and "blocksize=15996475184" in pid46007.log.info。(P.S:parallel_jobs =8 )So I'm confused about the level at which these parameters work, I hope to get your help.

  • Linux version 3.10.0-327.el7.x86_64 (builder@kbuilder.dev.centos.org)
  • (gcc version 4.8.3 20140911 (Red Hat 4.8.3-9) (GCC) ) #1 SMP Thu Nov 19 22:10:57 UTC 2015
  • Python 2.7.18 :: Anaconda, Inc.
  • Nextdenovo_v2.4.0

run.cfg:

[General]
job_type = local
job_prefix = test
task = all # 'all', 'correct', 'assemble'
rewrite = yes # yes/no
deltmp = yes
rerun = 3
parallel_jobs = 8
input_type = raw
read_type = clr
input_fofn = data3.fofn
workdir = workdir
#cluster_options = -l vf=10g,p=6 -q fat.q -S {bash} -w n #for sge

[correct_option]
read_cutoff = 1k
genome_size = 4620000000
blocksize = 1g
pa_correction = 200
seed_cutfiles = 200
sort_options = -m 12g -t 8 -k 40
minimap2_options_raw = -x ava-ont -t 8
correction_options = -p 8

[assemble_option]
random_round = 20
minimap2_options_cns = -x ava-ont -t 8 -k17 -w17
nextgraph_options = -a 1

pid46007.log.info:

rerun:                        3
task:                         all
deltmp:                       1
rewrite:                      1
read_type:                    clr
job_type:                     local
read_cutoff:                  1k
input_type:                   raw
parallel_jobs:                8
pa_correction:                8  #Sometimes this parameter is the same as  "parallel_jobs".I guess the reason is that 
    # “pa_correction”  is smaller than “parallel_jobs“.But there is a describtion which is "overwrite "parallel_jobs" 
    #only for this  step. " in <https://nextdenovo.readthedocs.io/en>.
random_round:                 20
seed_depth:                   45.0
seed_cutoff:                  22630
seed_cutfiles:                200
job_prefix:                   test
blocksize:                    15996475184    #How to calculate?
ctg_cns_options:              -p 8
genome_size:                  4620000000
nextgraph_options:            -a 1
minimap2_options_map:         -x map-ont
minimap2_options_raw:         -x ava-ont -t 8
sort_options:                 -m 12g -t 8 -k 40 -k 40
correction_options:           -p 8 -max_lq_length 10000 -min_len_seed 11315
minimap2_options_cns:         -x ava-ont -t 8 -k17 -w17 -k 17 -w 17 --minlen 2000 --maxhan1 5000

Thank you for your reading and I am looking forward to your reply :)

Hi, if you do not set seed_cutoff, NextDenovo will calculate it and update options (such as blocksize) related to it.
besides, pa_correction always should be much less than parallel_jobs, because pa_correction tasks required much more memory than parallel_jobs tasks.
For some new users, illogical settings will crash the computer server, so NextDenovo will automatically adjust some parameters.

I just found your config file has some errors, read_type = clr means your reads are PacBio date, while ava-ont and map-ont are only used for NanoPore reads. So, if you do not familiar with it, just set the required options, let NextDenovo set omitted parameters automatically.

sorry,I didn't realize that error.Thank you for your correction;read_type= clr .ava-ont and map-ont will be changed into ava-pb and map-pb .However, I don't think this should have much impact on the overall results.The ONT's terms should be more lenient。