FRED-2 / OptiType

Precision HLA typing from next-generation sequencing data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

OptiType killed OOM

Akazhiel opened this issue · comments

Hello!
I'm trying to run optitype in the cloud in a machine with 64GB of RAM with RNA Samples that are around 5GB each due to it being paired end. And the process is killed because it runs out of memory. I've also run OptiType in a pipeline with DNA Tumor-Normal pair samples which are hlatyped in parallel without memory problems.

My questions is, how much memory is needed to run OptiType?

Best regards!

In my experience, 12GB is sufficient.
But I don't feed the entire FASTQ's, only the HLA region on chr6 that is relevant.

commented

Hello @Akazhiel , in my own situation, a server with 128GB RAM gets OOM often when treating WES samples with fastq.gz files size of 20GB.
This is really upset and I am seeking for a workaround.

Perhaps extract reads on chr6 from bam files, and run optitype will help?

Hello, is there a solution for this? I found that files >100,000 reads cause a kill signal... But these are only 400-500MB files. But works well if I stay below that.

If you are running razers3 and it ran out of memory, you can try to split input file.

bgzip -cd [your fastq] | split -l 40000000 -a 5 -d --filter='razers3 -i 97 -m 99999 --distance-range 0 -pa -tc 0 -o $FILE.bam [your ref] /dev/stdin' /dev/stdin [split prefix]
samtools cat [split prefix]*.bam | samtools view -o res.bam

Then give merged bam files instead of fastqs to optitype.

You can use gzip instead of bgzip(but it is mush slower), adjust split unit(40M lines means 10M reads in my case, which need 4G mem when align reads).
I strongly suggest to use samtools view to recompress the bam file from samtools cat, because samtools cat use some tricky method to merge bam files, and it may not be supported by older decompressor. And, very important, if using samtools cat for directly output, make sure the output file is not in input list, or you will get a infinite file size.

Another things you should notice, is that you should force single thread in razers3, it has multi-thread inconsistency, may lead to a little problem in its result. And its multi-thread seem to have very less speed-up in a small ref.

If you think split fastq is not a good plan, you can also use bowtie2 to filter fastq.

bowtie2 --no-unal --very-sensitive-local --local --omit-sec-seq -p 10 --reorder .....(index and fastq)

Bowtie2 use about 200MB memory(will not increase when input become larger), and can give a filtered bam, remove useless seq. Then you can convert its bam to fastq for optitype use. However, I'm not sure this method give completely consistent output compared with directly use raw fastq.