Parameters for a highly heterozygous genome

Question

Parameters for a highly heterozygous genome

A-J-F-Mackintosh opened this issue 3 years ago · comments

Hi,

First of all, thank you for your work on this really nice assembler.

I have had success using NextDenovo v2.4.0 to assemble some insect genomes. I will soon try to assemble a 350Mb genome, where heterozygosity is 3.5%, from 68X Pacbio CLR data.

Would you recommend changing any particular parameters? For example, I could set the genome size as 700Mb (diploid) to reduce the minimum seed length. My thinking is that this would provide more information per-haplotype and so less small bubbles will appear in the graph.

Many thanks,

Alex

Hu Jiang · Answer 1 · Fri Oct 22 2021 17:44:07 GMT+0800 (China Standard Time)

Could you provide how about the assembly result with genome_size=350Mb?

A-J-F-Mackintosh · Answer 2 · Fri Oct 22 2021 17:45:34 GMT+0800 (China Standard Time)

Yes, once it is finished I will post the results here.

Alex

Hu Jiang · Answer 3 · Fri Oct 22 2021 17:46:29 GMT+0800 (China Standard Time)

ok, after that, maybe I can provide some suggestions.

A-J-F-Mackintosh · Answer 4 · Sat Oct 23 2021 19:59:23 GMT+0800 (China Standard Time)

Hi,

With genome_size=350Mb the result is very good. It will need haplotig purging but that is fine.

Reads

Types            Count (#) Length (bp)
N10                  44772   43676
N20                 105555   35441
N30                 177975   30403
N40                 261524   26480
N50                 358097   22673
N60                 472811   18758
N70                 614194   14909
N80                 797644   11106
N90                1060882    7117

Types               Count (#)           Bases (bp)  Depth (X)
Raw                   1802948          23747597071      67.85
Filtered               124762             57217197       0.16
Clean                 1678186          23690379874      67.69

*Suggested seed_cutoff (genome size: 350.00Mb, expected seed depth: 45, real seed depth: 45.00): 16257 bp

Contigs

Type           Length (bp)            Count (#)
N10             10837787                   5
N20              7882032                  13
N30              5921275                  22
N40              4274629                  35
N50              3531405                  53
N60              2987626                  73
N70              2214973                  98
N80              1443093                 134
N90               702402                 196

Min.               17033                   -
Max.            16126662                   -
Ave.             1401976                   -
Total          663135083                 473

Could still be worth trying genome_size=700Mb? Or reducing seed_depth to 30?

Best,

Alex

Hu Jiang · Answer 5 · Mon Oct 25 2021 09:22:26 GMT+0800 (China Standard Time)

you can try nextgraph_options = -A or nextgraph_options =-q 5 or 10 or try seed_depth=40, but I am not sure whether these opts will improve the assembly result.

A-J-F-Mackintosh · Answer 6 · Wed Oct 27 2021 02:26:01 GMT+0800 (China Standard Time)

Hi,

I tried seed_depth = 40 along with nextgraph_options = -a 1 -A. This generated a larger assembly (754Mb) with similar contiguity as before.

I will use my first assembly as this will be easier to purge.

Thank you for the advice nonetheless.

Best,

Alex