Nextomics / NextDenovo

Fast and accurate de novo assembler for long reads

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Help with parameters for desktop run

bensprung opened this issue · comments

Hi, I can't figure out how to set various parameters in a self-consistent way. I'm on a desktop with 6 cores, 32 GB RAM, working with a yeast genome of ~12 mb and about 230x coverage with ONT reads, so about 2.8e09 bases. The FAQ says:

parallel_jobs = M/64 #here, 64 can optimize to 32~64
...

[correct_option]
pa_correction = M/(TOTAL_INPUT_BASES * 1.2/4)
sort_options = -m TOTAL_INPUT_BASES * 1.2/4g -t P/pa_correction
correction_options = -p P/pa_correction
minimap2_options_raw = -t P/parallel_jobs
...

[assemble_option]
minimap2_options_cns = -t P/parallel_jobs

Since parallel_jobs comes to 32/64, I assume I should set it to 1? pa_correction comes to 32e09/(2.8e09*1.2/4) = 38. But then P/pa_correction = 6/38 << 1, so I'm not sure how to proceed.

I also got the following warning:

*Suggested seed_cutoff (genome size: 12.00Mb, expected seed depth: 45, real seed depth: 25.00): 8721 bp
*NOTE: The read/seed length is too short, and the assembly result is unexpected and please check the assembly quality carefully. Of course, it's better to sequencing more longer reads and try again.

I left read_cutoff = 1k and I set genome_size = 12m.

  1. You can not run NextDenovo on a 32 GB RAM Computer, the RAM is too small.
  2. For the warning, it means the length of input ONT data is too short for NextDenovo.

So, I suggest you can try with other assemblers.

OK. How much RAM is the minimum, for a 12 Mbp genome? (And I do have more ONT reads, I only gave it a subset to try it out. I think I have up to 800x coverage. Definitely 400x. )

FWIW I did get a reasonable-looking assembly out using these parameters:

[General]
job_type = local # local, slurm, sge, pbs, lsf
job_prefix = nextDenovo
task = all # all, correct, assemble
rewrite = yes # yes/no
deltmp = yes
parallel_jobs = 1 # number of tasks used to run in parallel
input_type = raw# raw, corrected
read_type = ont # clr, ont, hifi
input_fofn = input.fofn
workdir = BGS1_uncorr_nextDenovo

[correct_option]
read_cutoff = 1k
genome_size = 12m # estimated genome size
sort_options = -m 8g -t 4 
minimap2_options_raw = -t 4 
correction_options = -p 1 

[assemble_option]
minimap2_options_cns = -t 4
nextgraph_options = -a 1
  1. It depends on the input data size, reads max length, genome size, et al, so it is hard to say.
  2. You can select the top 60X longest ONT reads to do the assembly.

Thanks. What do you mean by the top 60X longest? Select the longest reads sufficient to give 60X coverage?

Yes

Got it. Thank you.