compare v2.4 with v2.2-beta.0

Question

compare v2.4 with v2.2-beta.0

HanKMU opened this issue 4 years ago · comments

Dear authors,
I had tried to use only about 30X of nanopore data to assemble a 1G genome with v2.2-beta.0 and get N50 140536.
However, when I update to v2.4 and rerun with the same dataset, I only get N50 89931.
Any suggestion about this?
Thank you!

Here is the run.cfg of v2.2-beta.0.

[General]
job_type = local
job_prefix = EEG_nextDenovo
task = all 
rewrite = yes 
deltmp = yes
rerun = 3
parallel_jobs = 20
input_type = raw
input_fofn = input.fofn
workdir = 01_rundir

[correct_option]
read_cutoff = 1k
seed_cutoff = 1193
blocksize = 2g
pa_correction = 20
seed_cutfiles = 20
sort_options = -m 20g -t 10 -k 30
minimap2_options_raw = -x ava-ont -t 8  --minlen 1000
correction_options = -b

[assemble_option]
random_round = 20
minimap2_options_cns = -x ava-ont -t 8 -k17 -w17
nextgraph_options = -a 1

Here is the run.cfg of v2.4.

[General]
job_type = local # local, slurm, sge, pbs, lsf
job_prefix = nextDenovo
task = all # all, correct, assemble
rewrite = yes # yes/no
deltmp = yes 
parallel_jobs = 20 # number of tasks used to run in parallel
input_type = raw # raw, corrected
read_type = ont # clr, ont, hifi
input_fofn = ./20210112input.fofn
workdir = 01_rundir

[correct_option]
read_cutoff = 1k
genome_size = 1G # estimated genome size
seed_depth = 31
sort_options = -m 20g -t 25 -k 30
minimap2_options_raw = -t 25 --minlen 1000
pa_correction = 3 # number of corrected tasks used to run in parallel, each corrected task requires ~TOTAL_INPUT_BASES/4 bytes of memory usage.
correction_options = -b

[assemble_option]
minimap2_options_cns = -t 20 -k17 -w17
minimap2_options_map = -t 20
nextgraph_options = -a 1

Hu Jiang commented 3 years ago

yes

Hu Jiang · Answer 1 · Tue Jan 19 2021 09:31:19 GMT+0800 (China Standard Time)

Are the seed_cutoff values of different versions the same?

Hu Jiang · Answer 2 · Tue Jan 19 2021 10:38:43 GMT+0800 (China Standard Time)

The new version has updated some default parameters and algorithms.

HanKMU · Answer 3 · Tue Jan 19 2021 11:42:21 GMT+0800 (China Standard Time)

Are the seed_cutoff values of different versions the same?

No..
In v2.2-beta.0, the seed_cutoff was calculated by seq_stat which was 1193.
In v2.4, according to the log.info, seed_cutoff was 3177.
Is there any recommendation for improving the assembly by v2.4?
The difference between two N50 is so big...
Thank you very much.

Hu Jiang · Answer 4 · Tue Jan 19 2021 13:07:56 GMT+0800 (China Standard Time)

Maybe this is the reason. You can try to set different seed_cutoff values to check the assembly results, I have no special suggestions, because if there is a better parameter set, I will set them as default.

HanKMU · Answer 5 · Wed Jan 20 2021 16:41:46 GMT+0800 (China Standard Time)

Thanks!
Can I add seed_cutoff option manually to the run.cfg of v2.4?
As the followed example?

[correct_option]
read_cutoff = 1k
genome_size = 1G # estimated genome size
seed_cutoff = 1193
seed_depth = 31
sort_options = -m 20g -t 25 -k 30
minimap2_options_raw = -t 25 --minlen 1000
pa_correction = 3 # number of corrected tasks used to run in parallel, each corrected task requires ~TOTAL_INPUT_BASES/4 bytes of memory usage.
correction_options = -b