Nextomics / NextDenovo

Fast and accurate de novo assembler for long reads

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to improve the N50 and reduce contigs numbers?

cj2jy opened this issue · comments

Hi, I finished an assembly and the result is:

Type Length (bp) Count (#)
N10 22880485 3
N20 10335838 10
N30 6877938 22
N40 5222529 39
N50 3377214 63
N60 1919981 103
N70 927783 178
N80 440773 335
N90 218142 666

Min. 28326 -
Max. 57348206 -
Ave. 742827 -
Total 992417917 1336

run.cfg:

[General]
job_type = slurm
submit = sbatch --cpus-per-task=20 --mem-per-cpu=4g -o {out} -e {err} {script}
job_prefix = nextDenovo
task = all # 'all', 'correct', 'assemble'
rewrite = yes # yes/no
deltmp = yes
rerun = 1
parallel_jobs = 5
input_type = raw
read_type = ont
input_fofn = ./input.fofn
workdir = ./02_rundir

[correct_option]
read_cutoff = 2k
genome_size = 850M
seed_cutoff = 25000
pa_correction = 3
sort_options = -m 20g -t 18
minimap2_options_raw = -t 18
correction_options = -p 18

[assemble_option]
random_round = 20
minimap2_options_cns = -t 18 -k 23 -w 10
nextgraph_options = -a 1 -q 10

What can I do to increase the N50 and reduce the total number of contigs? I want a better result for 3d-DNA.
Looking forward to reply. Thank you.

It's hard to say, if I had a better solution I would set it as the default value. How ever, I think you can try to optimize these parameters: seed_cutoff, -k -w -f in minimap2_options_raw and minimap2_options_cns. BTW, you should make sure you are using the latest version of NextDenovo. You also can sequencing more ultra-long ONT SUP reads. At the last, you can try some other assemblers.

Thank you, I will change those parameters and try again. But I don't know what the -f means and how to optimize it, do you have any suggestion?

try -f 0.0001 or less

Thank you, I ran again and it is still running. Can I use my last assembly result nd.asm.fasta as input to run assemble again? Would that be a better result?

Thank you, I ran again and it is still running. Can I use my last assembly result nd.asm.fasta as input to run assemble again? Would that be a better result?

Hi @cj2jy,
I'm still trying to understand how to run NextDenovo using SLURM. Could you share your script.slurm.sh?

In the run.cfg you set submit = sbatch --cpus-per-task=20 --mem-per-cpu=4g, so that means you also set #SBATCH --cpus-per-task=20 and #SBATCH --mem-per-cpu=4g?