0xTCG / biser

A fast tool for detecting and decomposing segmental duplications in genome assemblies

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Very long runtime, is it possible to resume a killed run?

EinarBaldvin opened this issue · comments

Hi,

is there a way to resume a run that was killed?

The run was killed once it reached a 7 day timelimit, using 54 cores, and it was in the putative alignment stage. The genome is quite large ~4 Gbp.

biser -o output_file --temp /segdup/biser_temp -t 54 --hard genome.mod.MAKER.masked.fasta

Running BISER v1.2.3 on 1 genome(s): genome.mod.MAKER.masked

  1. Putative SD detection
    Search: 1566:30s (single: 5057:06s)

  2. Putative SD alignment

Best,
Einar

Hey @EinarBaldvin

Yes, BISER should resume the failed run.
However, the long runtime indicates that your genome is not masked properly. Which genome are you running it on?

Hi @inumanag ,

How do I get BISER to resume the run? It always creates a new random named run directory at each start.

It is a genome I assembled to chromosomal scale and ran the EDTA pipeline to mask it, EDTA conservatively hard masked 54% of the genome. It does not finish the run in 7 days with 54 cores and 2 TB of RAM.

I now hardmasked all annotated TEs or 84% of the genome with bedtools maskfasta but now I oddly run into the problem that it need more RAM than I can provide 2.8 TB, even with only 24 cores.

Any ideas?

Best,
Einar

Hi @EinarBaldvin

I will need bit more information to debug this one. That is certainly unexpected: unless you are having 100GB genome, the runs should finish in a few minutes on such machine.

What is the size of the genome? Which species it is? Do you do hard- or soft-masking? Would it be possible for me to take a look at the data? Please reach out to me via email if you prefer sharing the details privately.

Fixed in 1.3--- thank you for the report!

To make your run faster, use --max_chromosome_size=100_000_000. The search step completes in 20 minutes, (16 cores), however the alignment takes 10+ hours due to the insane amount of repeats in the barley genome.

You can add --max-error=20 --max-edit-error=10 for a less sensitive (10% edit error, 20% gapped error allowed) but faster mode. It might be even preferred with such genomes.