Nextomics / NextDenovo

Fast and accurate de novo assembler for long reads

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

compare v2.4 with v2.2-beta.0

HanKMU opened this issue · comments

Dear authors,
I had tried to use only about 30X of nanopore data to assemble a 1G genome with v2.2-beta.0 and get N50 140536.
However, when I update to v2.4 and rerun with the same dataset, I only get N50 89931.
Any suggestion about this?
Thank you!

Here is the run.cfg of v2.2-beta.0.

[General]
job_type = local
job_prefix = EEG_nextDenovo
task = all 
rewrite = yes 
deltmp = yes
rerun = 3
parallel_jobs = 20
input_type = raw
input_fofn = input.fofn
workdir = 01_rundir

[correct_option]
read_cutoff = 1k
seed_cutoff = 1193
blocksize = 2g
pa_correction = 20
seed_cutfiles = 20
sort_options = -m 20g -t 10 -k 30
minimap2_options_raw = -x ava-ont -t 8  --minlen 1000
correction_options = -b

[assemble_option]
random_round = 20
minimap2_options_cns = -x ava-ont -t 8 -k17 -w17
nextgraph_options = -a 1

Here is the run.cfg of v2.4.

[General]
job_type = local # local, slurm, sge, pbs, lsf
job_prefix = nextDenovo
task = all # all, correct, assemble
rewrite = yes # yes/no
deltmp = yes 
parallel_jobs = 20 # number of tasks used to run in parallel
input_type = raw # raw, corrected
read_type = ont # clr, ont, hifi
input_fofn = ./20210112input.fofn
workdir = 01_rundir

[correct_option]
read_cutoff = 1k
genome_size = 1G # estimated genome size
seed_depth = 31
sort_options = -m 20g -t 25 -k 30
minimap2_options_raw = -t 25 --minlen 1000
pa_correction = 3 # number of corrected tasks used to run in parallel, each corrected task requires ~TOTAL_INPUT_BASES/4 bytes of memory usage.
correction_options = -b

[assemble_option]
minimap2_options_cns = -t 20 -k17 -w17
minimap2_options_map = -t 20
nextgraph_options = -a 1

Are the seed_cutoff values of different versions the same?

The new version has updated some default parameters and algorithms.

Are the seed_cutoff values of different versions the same?

No..
In v2.2-beta.0, the seed_cutoff was calculated by seq_stat which was 1193.
In v2.4, according to the log.info, seed_cutoff was 3177.
Is there any recommendation for improving the assembly by v2.4?
The difference between two N50 is so big...
Thank you very much.

Maybe this is the reason. You can try to set different seed_cutoff values to check the assembly results, I have no special suggestions, because if there is a better parameter set, I will set them as default.

Thanks!
Can I add seed_cutoff option manually to the run.cfg of v2.4?
As the followed example?

[correct_option]
read_cutoff = 1k
genome_size = 1G # estimated genome size
seed_cutoff = 1193
seed_depth = 31
sort_options = -m 20g -t 25 -k 30
minimap2_options_raw = -t 25 --minlen 1000
pa_correction = 3 # number of corrected tasks used to run in parallel, each corrected task requires ~TOTAL_INPUT_BASES/4 bytes of memory usage.
correction_options = -b

yes