chhylp123 / hifiasm

Hifiasm: a haplotype-resolved assembler for accurate Hifi reads

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

larger assembly size than kmer estimation genome size

leon945945 opened this issue · comments

Hi, I estimated the genome size with HiFi data, the estimated genome size is 328Mb with 1.02% hetorozygosity:
plot

I assembled the primary genome and phased genome with HiC data by hifiasm. The size of primary genome is 409Mb and two phased haplotype are 389Mb and 366Mb with default hifiasm -s 0.55. They are larger than the estimation genome size.

Then I adjusted the parameter to -s 0.3, the primary genome size decreased to 396Mb, two phased haplotype size decreased to 377Mb and 356Mb. They are still larger than the estimation size.

Could you please give me some suggestions on how to adjust the assembly size. Thanks.

@leon945945 Sorry for the late reply. The estimated genome size from k-mers might be smaller than the real genome size, since they may underestimate repetitive regions. In addition, it would be better to discard too short contigs. Removing these useless small contigs may make both haplotypes smaller.