Unusual N50 and genome size from triocanu hap reads

Question

Unusual N50 and genome size from triocanu hap reads

hrluo93 opened this issue 2 years ago · comments

Hi,

We used ont ultra-long hap reads from triocanu to assemble the hap genome. we found an unusual N50 and genome size by using ver2.5.0.
The genome size was less than our expected about 100Mb and the N50 was quite low only 1Mb.
what caused this unusual result?

Best Wishes!
Ran

triocanu:
5712315 reads 113196439599 bases written to haplotype file ./haplotype-Mat.fasta.gz.
5920759 reads 117888522482 bases written to haplotype file ./haplotype-Pat.fasta.gz.
80281 reads 163332564 bases written to haplotype file ./haplotype-unknown.fasta.gz.
722242 reads 416302535 bases filtered for being too short.

seq_stat
[Read length stat]
Types Count (#) Length (bp)
N10 126241 73349
N20 310602 57003
N30 539192 47044
N40 812681 39707
N50 1134462 33909
N60 1511527 28777
N70 1967562 22918
N80 2571902 16448
N90 3483829 9907

Types Count (#) Bases (bp) Depth (X)
Raw 5920759 117888522482 117.89
Filtered 0 0 0.00
Clean 5920759 117888522482 117.89

*Suggested seed_cutoff (genome size: 1000.00Mb, expected seed depth: 45, real seed depth: 45.00): 40906 bp

our set
rerun: 3
task: all
deltmp: 1
rewrite: 1
read_type: ont
job_type: local
input_type: raw
genome_size: 1g
seed_depth: 45.0
parallel_jobs: 5
pa_correction: 3
seed_cutfiles: 3
read_cutoff: 25k
job_prefix: nextfe
seed_cutoff: 40906
blocksize: 11214124835
ctg_cns_options: -p 15
nextgraph_options: -a 1
sort_options: -m 20g -t 15 -k 40
minimap2_options_map: -x map-ont
minimap2_options_raw: -t 8 -x ava-ont
correction_options: -p 15 -max_lq_length 10000 -min_len_seed 20453
minimap2_options_cns: -t 8 -x ava-ont -k 17 -w 17 --minlen 2000 --maxhan1 5000

[Read length stat]
Types Count (#) Length (bp)
N10 75719 82925
N20 182706 66596
N30 310850 56989
N40 458514 49976
N50 625760 44358
N60 813392 39692
N70 1022447 35703
N80 1254589 32159
N90 1512998 28759

Types Count (#) Bases (bp) Depth (X)
Raw 5920759 117888522482 117.89
Filtered 4115291 39249147976 39.25
Clean 1805468 78639374506 78.64

Result
Type Length (bp) Count (#)
N10 4637305 14
N20 3002831 37
N30 2161021 72
N40 1744945 116
N50 1288785 173
N60 1019279 248
N70 793166 343
N80 580083 471
N90 384460 650

Min. 40232 -
Max. 12343781 -
Ave. 856464 -
Total 859033589 1003

Hu Jiang · Answer 1 · Fri Apr 15 2022 14:44:43 GMT+0800 (China Standard Time)

How about the result using all data?

hrluo93 · Answer 2 · Fri Apr 15 2022 15:28:22 GMT+0800 (China Standard Time)

Thank you very much for your reply! I am trying using all raw ont reads to assemble non-hap to verify if some reads missing because of triocanu.
And planning uses all hap data to do hap asm to verify if some needed reads contain in short reads.
If using all data to do hap asm, according to my log, what seed_depth, seedcut and readcut you suggest use? 50X, auto, 5K?

Best Wishes!
Ran

Hu Jiang · Answer 3 · Fri Apr 15 2022 16:37:40 GMT+0800 (China Standard Time)

Just try to use the default value to see how about the result, first.

hrluo93 · Answer 4 · Fri Apr 15 2022 16:43:44 GMT+0800 (China Standard Time)

Just try to use the default value to see how about the result, first.

Thanks! Dr.Hu, I am trying readcut 1K first！