luntergroup / octopus

Bayesian haplotype-based mutation calling

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

very slow call set refinement with refcall

nemartins opened this issue · comments

The call set refinement step of octopus (v0.7.4 or development) when using the refcall option runs very slowly (50min with a 1.5Mb reference genome at 50X coverage).

The initial calling is very quick, around 1 minutes, but then stalls. I've tried to play with the available memory (increasing or decreasing the -B value) and to disable filtering, but the issue remains.

Without refcall, the full run finishes in about 2 min.

Do you have an idea what's happening?

Hi, this is unfortunately a known performance bug - the issue is, as part of filtering, octopus realigns all reads to called haplotypes, which is fine for typical whole-genome variant call sets, but becomes very expensive when including reference calls. It's on my TODO list to find a workaround for this.