chhylp123 / hifiasm

Hifiasm: a haplotype-resolved assembler for accurate Hifi reads

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

setting K parameter in yak

DustinSokolowski opened this issue · comments

Hello and thank you for the excellent tool. I am planning to run hifiasm in trio mode and I noticed that most of the documentation and other people's examples use yak to count k-mers of length 30 or 31. This being said, when running hifiasm on the hifi reads alone, my k-mer plot has a distribution around the k=21 length, which is concordant with the reported genome size of my species. As such, would haplotyping be more optimal if i set k=21 for yak, or does the k-mer length serve a different purpose in trio binning?

Thanks,
Dustin

k-mer plot

M::ha_hist_line] 1: ****************************************************************************************************> 2022323
[M::ha_hist_line] 2: ********** 167613
[M::ha_hist_line] 3: ****** 97305
[M::ha_hist_line] 4: ***** 90440
[M::ha_hist_line] 5: ****** 104266
[M::ha_hist_line] 6: ******** 133508
[M::ha_hist_line] 7: *********** 179675
[M::ha_hist_line] 8: ************** 239751
[M::ha_hist_line] 9: ****************** 301916
[M::ha_hist_line] 10: ********************** 366589
[M::ha_hist_line] 11: ************************* 426119
[M::ha_hist_line] 12: **************************** 477479
[M::ha_hist_line] 13: ******************************** 529580
[M::ha_hist_line] 14: *********************************** 590665
[M::ha_hist_line] 15: **************************************** 666154
[M::ha_hist_line] 16: ********************************************** 766964
[M::ha_hist_line] 17: ***************************************************** 890382
[M::ha_hist_line] 18: ************************************************************** 1040052
[M::ha_hist_line] 19: ************************************************************************ 1202902
[M::ha_hist_line] 20: ********************************************************************************* 1356986
[M::ha_hist_line] 21: ***************************************************************************************** 1494264
[M::ha_hist_line] 22: *********************************************************************************************** 1595592
[M::ha_hist_line] 23: *************************************************************************************************** 1663792
[M::ha_hist_line] 24: **************************************************************************************************** 1676810
[M::ha_hist_line] 25: ************************************************************************************************** 1643687
[M::ha_hist_line] 26: ********************************************************************************************* 1564664
[M::ha_hist_line] 27: ************************************************************************************** 1448306
[M::ha_hist_line] 28: ****************************************************************************** 1306422
[M::ha_hist_line] 29: ******************************************************************** 1145983
[M::ha_hist_line] 30: *********************************************************** 990080
[M::ha_hist_line] 31: ************************************************** 838249
[M::ha_hist_line] 32: ****************************************** 700935
[M::ha_hist_line] 33: *********************************** 584442
[M::ha_hist_line] 34: ***************************** 489906
[M::ha_hist_line] 35: ************************* 417098
[M::ha_hist_line] 36: ********************** 367442
[M::ha_hist_line] 37: ******************** 332180
[M::ha_hist_line] 38: ******************* 311726
[M::ha_hist_line] 39: ****************** 303148
[M::ha_hist_line] 40: ****************** 303903
[M::ha_hist_line] 41: ****************** 308533
[M::ha_hist_line] 42: ******************* 317978
[M::ha_hist_line] 43: ******************** 329700
[M::ha_hist_line] 44: ******************** 340626
[M::ha_hist_line] 45: ********************* 351424
[M::ha_hist_line] 46: ********************* 359569
[M::ha_hist_line] 47: ********************** 366044
[M::ha_hist_line] 48: ********************** 368316
[M::ha_hist_line] 49: ********************** 366755
[M::ha_hist_line] 50: ********************* 360276
[M::ha_hist_line] 51: ********************* 351838
[M::ha_hist_line] 52: ******************** 340000
[M::ha_hist_line] 53: ******************* 324795
[M::ha_hist_line] 54: ****************** 307406
[M::ha_hist_line] 55: ***************** 289240
[M::ha_hist_line] 56: **************** 271534
[M::ha_hist_line] 57: *************** 253417
[M::ha_hist_line] 58: ************** 236214
[M::ha_hist_line] 59: ************* 219354
[M::ha_hist_line] 60: ************ 206034
[M::ha_hist_line] 61: ************ 194381
[M::ha_hist_line] 62: *********** 184370
[M::ha_hist_line] 63: ********** 175694
[M::ha_hist_line] 64: ********** 169046
[M::ha_hist_line] 65: ********** 164726
[M::ha_hist_line] 66: ********** 160449
[M::ha_hist_line] 67: ********* 159184
[M::ha_hist_line] 68: ********* 157092
[M::ha_hist_line] 69: ********* 156508
[M::ha_hist_line] 70: ********* 155059
[M::ha_hist_line] 71: ********* 155225
[M::ha_hist_line] 72: ********* 154199
[M::ha_hist_line] 73: ********* 152917
[M::ha_hist_line] 74: ********* 150678
[M::ha_hist_line] 75: ********* 148186
[M::ha_hist_line] 76: ********* 146112
[M::ha_hist_line] 77: ******** 142213
[M::ha_hist_line] 78: ******** 138570
[M::ha_hist_line] 79: ******** 133686
[M::ha_hist_line] 80: ******** 130146
[M::ha_hist_line] 81: ******* 125622
[M::ha_hist_line] 82: ******* 121009
[M::ha_hist_line] 83: ******* 116732
[M::ha_hist_line] 84: ******* 112211
[M::ha_hist_line] 85: ****** 107985
[M::ha_hist_line] 86: ****** 104312
[M::ha_hist_line] 87: ****** 100380
[M::ha_hist_line] 88: ****** 96679
[M::ha_hist_line] 89: ****** 93833
[M::ha_hist_line] 90: ***** 91256
[M::ha_hist_line] 91: ***** 89006
[M::ha_hist_line] 92: ***** 87071
[M::ha_hist_line] 93: ***** 85665
[M::ha_hist_line] 94: ***** 84023
[M::ha_hist_line] 95: ***** 82525
[M::ha_hist_line] 96: ***** 80611
[M::ha_hist_line] 97: ***** 79050
[M::ha_hist_line] 98: ***** 78590

[M::ha_hist_line] 21 means the frequency of 31-mers that occur 21 times, instead of 21-mers.

Got it. using merqury's bestK I also get 21. Do you know if this is also a different K, and for a rodent genome (2.5Gb) you'd suggest keeping the standard k=30? thanks again for the clarifications.