Help with parameters for desktop run

Question

Help with parameters for desktop run

bensprung opened this issue 2 years ago · comments

Hi, I can't figure out how to set various parameters in a self-consistent way. I'm on a desktop with 6 cores, 32 GB RAM, working with a yeast genome of ~12 mb and about 230x coverage with ONT reads, so about 2.8e09 bases. The FAQ says:

parallel_jobs = M/64 #here, 64 can optimize to 32~64
...

[correct_option]
pa_correction = M/(TOTAL_INPUT_BASES * 1.2/4)
sort_options = -m TOTAL_INPUT_BASES * 1.2/4g -t P/pa_correction
correction_options = -p P/pa_correction
minimap2_options_raw = -t P/parallel_jobs
...

[assemble_option]
minimap2_options_cns = -t P/parallel_jobs

Since parallel_jobs comes to 32/64, I assume I should set it to 1? pa_correction comes to 32e09/(2.8e09*1.2/4) = 38. But then P/pa_correction = 6/38 << 1, so I'm not sure how to proceed.

I also got the following warning:

*Suggested seed_cutoff (genome size: 12.00Mb, expected seed depth: 45, real seed depth: 25.00): 8721 bp
*NOTE: The read/seed length is too short, and the assembly result is unexpected and please check the assembly quality carefully. Of course, it's better to sequencing more longer reads and try again.

I left read_cutoff = 1k and I set genome_size = 12m.

Hu Jiang commented 2 years ago

Yes

Hu Jiang · Answer 1 · Fri Jul 08 2022 09:03:01 GMT+0800 (China Standard Time)

You can not run NextDenovo on a 32 GB RAM Computer, the RAM is too small.
For the warning, it means the length of input ONT data is too short for NextDenovo.

So, I suggest you can try with other assemblers.

bensprung · Answer 2 · Fri Jul 08 2022 09:43:30 GMT+0800 (China Standard Time)

OK. How much RAM is the minimum, for a 12 Mbp genome? (And I do have more ONT reads, I only gave it a subset to try it out. I think I have up to 800x coverage. Definitely 400x. )

bensprung · Answer 3 · Fri Jul 08 2022 10:07:29 GMT+0800 (China Standard Time)

FWIW I did get a reasonable-looking assembly out using these parameters:

[General]
job_type = local # local, slurm, sge, pbs, lsf
job_prefix = nextDenovo
task = all # all, correct, assemble
rewrite = yes # yes/no
deltmp = yes
parallel_jobs = 1 # number of tasks used to run in parallel
input_type = raw# raw, corrected
read_type = ont # clr, ont, hifi
input_fofn = input.fofn
workdir = BGS1_uncorr_nextDenovo

[correct_option]
read_cutoff = 1k
genome_size = 12m # estimated genome size
sort_options = -m 8g -t 4 
minimap2_options_raw = -t 4 
correction_options = -p 1 

[assemble_option]
minimap2_options_cns = -t 4
nextgraph_options = -a 1

Hu Jiang · Answer 4 · Fri Jul 08 2022 12:55:24 GMT+0800 (China Standard Time)

It depends on the input data size, reads max length, genome size, et al, so it is hard to say.
You can select the top 60X longest ONT reads to do the assembly.

bensprung · Answer 5 · Mon Jul 11 2022 22:39:15 GMT+0800 (China Standard Time)

Thanks. What do you mean by the top 60X longest? Select the longest reads sufficient to give 60X coverage?

bensprung · Answer 6 · Tue Jul 12 2022 20:10:45 GMT+0800 (China Standard Time)

Got it. Thank you.