chhylp123 / hifiasm

Hifiasm: a haplotype-resolved assembler for accurate Hifi reads

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Read error correction does not reduce the number of kmers present once, twice or three times

chklopp opened this issue · comments

I try to assemble herro error corrected reads with hifiasm 0.19.8

But the number of kmers seen a low number of times does not decrease as expected

Initial histogram in the log

[M::ha_hist_line]     2: ****************************************************************************************************> 107716610
[M::ha_hist_line]     3: ****************************************************************************************************> 30996780
[M::ha_hist_line]     4: ************************************************************** 15593515
[M::ha_hist_line]     5: ********************************************* 11280672
[M::ha_hist_line]     6: ****************************************** 10378745
[M::ha_hist_line]     7: ******************************************** 10969244
[M::ha_hist_line]     8: ************************************************** 12440832
[M::ha_hist_line]     9: ********************************************************* 14313356
[M::ha_hist_line]    10: ****************************************************************** 16548878
[M::ha_hist_line]    11: *************************************************************************** 18834001
[M::ha_hist_line]    12: ************************************************************************************ 20983530
[M::ha_hist_line]    13: ******************************************************************************************* 22728212
[M::ha_hist_line]    14: ************************************************************************************************ 24067655
[M::ha_hist_line]    15: **************************************************************************************************** 24853609
[M::ha_hist_line]    16: **************************************************************************************************** 24957299
[M::ha_hist_line]    17: *************************************************************************************************** 24619025
[M::ha_hist_line]    18: *********************************************************************************************** 23715443
[M::ha_hist_line]    19: ****************************************************************************************** 22573874
[M::ha_hist_line]    20: ************************************************************************************* 21273382
[M::ha_hist_line]    21: ******************************************************************************** 19908321
[M::ha_hist_line]    22: *************************************************************************** 18789819
[M::ha_hist_line]    23: ************************************************************************ 18034450
[M::ha_hist_line]    24: ********************************************************************** 17581256
[M::ha_hist_line]    25: ********************************************************************** 17553086
[M::ha_hist_line]    26: ************************************************************************ 17887681
[M::ha_hist_line]    27: ************************************************************************** 18591686
[M::ha_hist_line]    28: ****************************************************************************** 19396013
[M::ha_hist_line]    29: ********************************************************************************* 20301932
[M::ha_hist_line]    30: ************************************************************************************ 21087688
[M::ha_hist_line]    31: *************************************************************************************** 21817734
[M::ha_hist_line]    32: ***************************************************************************************** 22298434
[M::ha_hist_line]    33: ****************************************************************************************** 22557746
[M::ha_hist_line]    34: ****************************************************************************************** 22440107
[M::ha_hist_line]    35: ***************************************************************************************** 22130525
[M::ha_hist_line]    36: ************************************************************************************** 21459325
[M::ha_hist_line]    37: ********************************************************************************** 20509144
[M::ha_hist_line]    38: ****************************************************************************** 19399337
[M::ha_hist_line]    39: ************************************************************************ 17962454
[M::ha_hist_line]    40: ***************************************************************** 16334878
[M::ha_hist_line]    41: *********************************************************** 14619679
[M::ha_hist_line]    42: *************************************************** 12793736
[M::ha_hist_line]    43: ******************************************** 11035127
[M::ha_hist_line]    44: ************************************** 9361763
[M::ha_hist_line]    45: ******************************* 7757802
[M::ha_hist_line]    46: ************************** 6387930
[M::ha_hist_line]    47: ********************* 5197173
[M::ha_hist_line]    48: ***************** 4122166

2nd histogram

M::ha_hist_line]     1: ****************************************************************************************************> 79362122
[M::ha_hist_line]     2: ****************************************************************************************************> 5824705
[M::ha_hist_line]     3: ****************************************************************************************************> 1842811
[M::ha_hist_line]     4: **************************************************************************************** 934285
[M::ha_hist_line]     5: ************************************************************* 646881
[M::ha_hist_line]     6: ***************************************************** 561993
[M::ha_hist_line]     7: ***************************************************** 563891
[M::ha_hist_line]     8: ********************************************************** 610696
[M::ha_hist_line]     9: **************************************************************** 67923

Third histogram

[M::ha_hist_line]     1: ****************************************************************************************************> 65809002
[M::ha_hist_line]     2: ****************************************************************************************************> 4505333
[M::ha_hist_line]     3: ****************************************************************************************************> 1459641
[M::ha_hist_line]     4: *************************************************************************** 774497
[M::ha_hist_line]     5: ******************************************************* 566043
[M::ha_hist_line]     6: ************************************************* 508811
[M::ha_hist_line]     7: *************************************************** 523826
[M::ha_hist_line]     8: ******************************************************** 572254

Fourth histogram

[M::ha_hist_line]     1: ****************************************************************************************************> 56509704
[M::ha_hist_line]     2: ****************************************************************************************************> 3725834
[M::ha_hist_line]     3: ****************************************************************************************************> 1243762
[M::ha_hist_line]     4: ******************************************************************** 688236
[M::ha_hist_line]     5: **************************************************** 520982
[M::ha_hist_line]     6: *********************************************** 479553
[M::ha_hist_line]     7: ************************************************** 500632
[M::ha_hist_line]     8: ****************************************************** 549716

Fifth histogram

[M::ha_hist_line]     1: ****************************************************************************************************> 50554387
[M::ha_hist_line]     2: ****************************************************************************************************> 3283946
[M::ha_hist_line]     3: ****************************************************************************************************> 1127797
[M::ha_hist_line]     4: **************************************************************** 642271
[M::ha_hist_line]     5: ************************************************** 496973
[M::ha_hist_line]     6: ********************************************** 463586
[M::ha_hist_line]     7: ************************************************* 486543
[M::ha_hist_line]     8: ****************************************************** 534726

The resulting assemlby metrics are low = small split assembly.
The coverage given in the gfa files are very low.

With hifi reads the last histogram only has very few kmer seens once left. What parameter could I tweak to improve this?

I'm a fellow user so don't have a definitive answer. Do you have before and after HERRO? I assume the histograms are all after HERRO? Were they HiFi reads or HiFi and HERRO ONT reads? If they were HiFi then perhaps HERRO did not make any improvement and they were already very accurate. I suggest looking at the read mapping for answers - where errors can generally be clearly seen.