dfguan / purge_dups

haplotypic duplication identification tool

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

set up manually cutoffs

17863952296 opened this issue · comments

Hello Dengfeng,

I have a similar question to those posed above. The locally generated cutoff file contains:5 -14 32 33 55 132

I am not sure if i should manually specify the cutoffs although several papers that use purge_dups on the same species i work with did specify them manually. I am not sure which lower, middle, and upper coverage cutoffs to choose and would really appreciate your help.
Below is my histogram generated from hist_plot.py
PB cov
Using these cutoffs, purge_dups yielded an assembly with fewer duplicate arachnida BUSCOs but certainly still lots retained. It went from C:95.0%[S:30.0%,D:65.0%],F:2.0%,M:3.0% on the primary Hifiasm contigs to C:94.0%[S:65.0%,D:29.0%],F:3.0%,M:3.0% on the purged primary. In terms of total length, we went from from 209.696MB to 184.428 MB.

Any advice to improve cutoffs on this to get it down to 1n assembly? Anything else we should generate to get a better handle on what's happening here? Thank you!

Hi,

You can try kmerDedup (https://github.com/xiekunwhy/kmerDedup) first then run purge_dups.

Best,
Kun