bcgsc / ntHits

Identifying repeats in high-throughput sequencing data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Parameter -b

hannesbecher opened this issue · comments

Hi there,
On the ntEDIT page, it is recommended to run ntHits with the -b option. But this is not clear what -b does (does not seem to be in ntHits's help string or on this page). Could you clarify please?
Cheers,
Hannes

commented

Hello Hannes,
Sorry for the delay in responding to you. We are still developing ntHits, but in a somewhat limited capacity at the moment.

-b is the Bloom filter bit size.

In conjunction with ntEdit, we advise our users to set -b to 36 or higher. Of course, larger values will increase the memory footprint of the Bloom filter.

  Use -b 36 to keep the Bloom filter false positive rate low (~0.0005).

We have reported the effect of the ntHits -b parameter on the FPR and ability of ntEdit to correct errors at ISM last year:

Please take a look at : https://warrenlr.github.io/papers/ntedit_ismb2019.pdf
(left panel)

In time, we will improve upon the documentation of ntHits.

Thank you for your interest in our tools

Thank you!